Welcome, Bob. Thanks for sharing the interesting story. Best, Liya Fan
On Wed, Sep 23, 2020 at 12:28 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Welcome to the community Bob. > > On Tue, Sep 22, 2020 at 12:27 PM Bob Tinsman <bobt...@pacbell.net> wrote: > > > I'd like to introduce myself, because I've had an interest in Arrow for a > > long time and now I have a chance to help out.Up until now, I haven't > > really contributed much in open source, although I've been an avid > > consumer, so I'd like to change that! > > My main areas of work have been performance optimization, Java, databases > > (mostly relational), and optimizing/refactoring architecture, but I also > > have some C/C++ background, and I'm a quick learner of new languages. > > > > The reason that I'm so interested in Arrow is that I've already created > > two in-memory columnar dataset implementations for two different > companies, > > so I'm a believer in the power of this model, although I came to it from > a > > different perspective.I was just watching this discussion with Wes and > > Jacques: Starting Apache Arrow > > > > | > > | > > | > > | | | > > > > | > > > > | > > | > > | | > > Starting Apache Arrow > > > > Our CTO Jacques Nadeau sat down for a fireside chat with Wes Mckinnney, > > discussing the past, present, and future... > > | > > > > | > > > > | > > > > > > Wes lays out two phases of Arrow:- Phase one: Arrow used as a common > > format- Phase two: Arrow used for actual calculationBecause I was working > > on my own, I skipped to phase two. > > I worked for an online marketing survey company called MarketTools in the > > early 00's. Survey results were stored in SQL Server, and we had to > > implement crosstabs on the data; for example, if you wanted to see > answers > > to survey answers broken down by age, gender, income range, etc. > > The original implementation would generate some pretty hairy SQL, which > > got pretty slow if there were a lot of questions on the crosstab.I > thought > > "why are we asking the DB to run multiple queries on the same data when > we > > could pull it into memory once, then do aggregate calculations > there?"That > > managed to produce a 5x speedup in running the crosstabs.In my most > recent > > company, I created a new in-memory dataset implementation as the basis > for > > an interactive data analysis tool. Again I was working with mostly > > relational databases. I was able to push the scalability of the in-memory > > columns a lot more using dictionaries. I also developed a hybrid engine > > combining SQL generation and in-memory calculation, sort of like what > Spark > > is doing.If I knew about Arrow, I would have definitely used it, but it > > wasn't around yet. You guys have accomplished a lot--congrats on your > 1.0.0 > > release, by the way!I'm starting out by grokking all the source and doc, > > and looking at JIRA issues that I could potentially work on, but I'm > > looking forward to helping out however I can. > > >