Re: Hello to the Arrow dev community

Fan Liya Wed, 23 Sep 2020 04:40:52 -0700

Welcome, Bob.
Thanks for sharing the interesting story.

Best,
Liya Fan



On Wed, Sep 23, 2020 at 12:28 PM Micah Kornfield <[email protected]>
wrote:

> Welcome to the community Bob.
>
> On Tue, Sep 22, 2020 at 12:27 PM Bob Tinsman <[email protected]> wrote:
>
> > I'd like to introduce myself, because I've had an interest in Arrow for a
> > long time and now I have a chance to help out.Up until now, I haven't
> > really contributed much in open source, although I've been an avid
> > consumer, so I'd like to change that!
> > My main areas of work have been performance optimization, Java, databases
> > (mostly relational), and optimizing/refactoring architecture, but I also
> > have some C/C++ background, and I'm a quick learner of new languages.
> >
> > The reason that I'm so interested in Arrow is that I've already created
> > two in-memory columnar dataset implementations for two different
> companies,
> > so I'm a believer in the power of this model, although I came to it from
> a
> > different perspective.I was just watching this discussion with Wes and
> > Jacques: Starting Apache Arrow
> >
> > |
> > |
> > |
> > |  |  |
> >
> >  |
> >
> >  |
> > |
> > |  |
> > Starting Apache Arrow
> >
> > Our CTO Jacques Nadeau sat down for a fireside chat with Wes Mckinnney,
> > discussing the past, present, and future...
> >  |
> >
> >  |
> >
> >  |
> >
> >
> > Wes lays out two phases of Arrow:- Phase one: Arrow used as a common
> > format- Phase two: Arrow used for actual calculationBecause I was working
> > on my own, I skipped to phase two.
> > I worked for an online marketing survey company called MarketTools in the
> > early 00's. Survey results were stored in SQL Server, and we had to
> > implement crosstabs on the data; for example, if you wanted to see
> answers
> > to survey answers broken down by age, gender, income range, etc.
> > The original implementation would generate some pretty hairy SQL, which
> > got pretty slow if there were a lot of questions on the crosstab.I
> thought
> > "why are we asking the DB to run multiple queries on the same data when
> we
> > could pull it into memory once, then do aggregate calculations
> there?"That
> > managed to produce a 5x speedup in running the crosstabs.In my most
> recent
> > company, I created a new in-memory dataset implementation as the basis
> for
> > an interactive data analysis tool. Again I was working with mostly
> > relational databases. I was able to push the scalability of the in-memory
> > columns a lot more using dictionaries. I also developed a hybrid engine
> > combining SQL generation and in-memory calculation, sort of like what
> Spark
> > is doing.If I knew about Arrow, I would have definitely used it, but it
> > wasn't around yet. You guys have accomplished a lot--congrats on your
> 1.0.0
> > release, by the way!I'm starting out by grokking all the source and doc,
> > and looking at JIRA issues that I could potentially work on, but I'm
> > looking forward to helping out however I can.
> >
>

Re: Hello to the Arrow dev community

Reply via email to