> “Apache Arrow is a format and compute kernel for in-memory data”

I like this but no one ever knows what "in-memory" means (or they just
think 'data is always in memory').  How about...

"Apache Arrow is a format and compute kernel for zero-copy processing
and sharing of data."

or...

"Apache Arrow is a format and compute kernel for processing and
sharing data without serialization overhead."

Although marshalling[1] would probably be a more precise word it is
not as well known.

[1] https://en.wikipedia.org/wiki/Marshalling_(computer_science)

On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
<mauri...@ursacomputing.com> wrote:
>
> a few ideas
>
> github.com/apache/arrow - Apache Arrow is an efficient library for big data
> processing and sharing
>
> github.com/apache/arrow - Apache Arrow is a computational tool for
> processing, storing and sharing large datasets
>
> github.com/apache/arrow - Apache Arrow is a  fast and simple library for
> big data analytics
>
> *github.com/apache/arrow <http://github.com/apache/arrow> - Apache Arrow is
> a powerful workhorse for analytic operations on modern hardware*
>
>
> On Mon, May 17, 2021 at 3:13 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:
>
> > Alright, well, whatever it is, it must fit into one breath. If the
> > high-concept pitch is successful, people will stick around for the full
> > pitch.
> >
> > Words such as “platform” and “enable” are noise. You say “platform”, they
> > start to say “what exactly do you mean by platform”, the elevator doors
> > open, and they’re gone.
> >
> > “Apache Arrow is a format and compute kernel for in-memory data”
> >
> >
> > > On May 17, 2021, at 12:03 PM, Eduardo Ponce <edponc...@gmail.com> wrote:
> > >
> > > One more suggestion for the bucket:
> > > "Apache Arrow is a computational platform for efficient in-memory data
> > > representation and processing."
> > >
> > > On Mon, May 17, 2021 at 2:49 PM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > >
> > >> I think less is better in the description, but unfortunately the
> > >> association of Arrow as being "just a data format" has been actively
> > >> harmful in some ways to community growth. We have a data format, yes,
> > >> but we are also creating a computational platform to go hand-in-hand
> > >> with the data format to make it easier to build fast applications that
> > >> use the data format. So the description needs to capture both of these
> > >> ideas.
> > >>
> > >> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <jhyde.apa...@gmail.com>
> > >> wrote:
> > >>>
> > >>> I think that the “cross-language development platform for” is noise.
> > >> (I’m sure that JPEG developers think that JPEG is a “cross-language
> > >> development platform” too. But it isn’t. It is an image format.)
> > >>>
> > >>> "Apache Arrow is data format for efficient in-memory processing.”
> > >>>
> > >>> I’ll note that In marketing speak, we are developing a high-concept
> > >> pitch [1] here. Every company needs a name, a brand, a high-concept
> > pitch,
> > >> and 3- or 4-sentence description. But every Apache project needs these
> > too.
> > >> It’s worth spending the time on the description, also, and then use
> > them in
> > >> all the places that we describe Arrow.
> > >>>
> > >>> Julian
> > >>>
> > >>> [1] https://www.growthink.com/content/whats-your-high-concept-pitch
> > >>>
> > >>>
> > >>>
> > >>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <edponc...@gmail.com>
> > >> wrote:
> > >>>>
> > >>>> I agree with Nate's and Brian's suggestions, but would like to add
> > >> that we
> > >>>> can make it a one-liner for more conciseness and consistency with
> > other
> > >>>> Apache projects.
> > >>>> Apologies if it seems I am going around the suggestions loop again.
> > >>>>
> > >>>> "Apache Arrow is a cross-language development platform enabling
> > >> efficient
> > >>>> in-memory data processing and transport."
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <bhule...@apache.org>
> > >> wrote:
> > >>>>
> > >>>>> Thank you for bringing this up Dominik. I sampled some of the
> > >> descriptions
> > >>>>> for other Apache projects I frequent, the ones with a meaningful
> > >>>>> description have a single sentence:
> > >>>>>
> > >>>>> github.com/apache/spark - Apache Spark - A unified analytics engine
> > >> for
> > >>>>> large-scale data processing
> > >>>>> github.com/apache/beam - Apache Beam is a unified programming model
> > >> for
> > >>>>> Batch and Streaming
> > >>>>> github.com/apache/avro - Apache Avro is a data serialization system
> > >>>>>
> > >>>>> Several others (Flink, Hadoop, ...) just have  "[Mirror of] Apache
> > >> <name>"
> > >>>>> as the description.
> > >>>>>
> > >>>>> +1 for Nate's suggestion "Apache Arrow is a cross-language
> > development
> > >>>>> platform for in-memory data. It enables systems to process and
> > >> transport
> > >>>>> data more efficiently."
> > >>>>>
> > >>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <wesmck...@gmail.com>
> > >> wrote:
> > >>>>>
> > >>>>>> It's probably best for description to limit mentions of specific
> > >>>>>> features. There are some high level features mentioned in the
> > >>>>>> description now ("computational libraries and zero-copy streaming
> > >>>>>> messaging and interprocess communication"), but now in 2021 since
> > the
> > >>>>>> project has grown so much, it could leave people with a limited view
> > >>>>>> of what they might find here.
> > >>>>>>
> > >>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
> > >>>>>> <mauri...@ursacomputing.com> wrote:
> > >>>>>>>
> > >>>>>>> How about
> > >>>>>>> 'Apache Arrow is a cross-language development platform for
> > in-memory
> > >>>>>> data.
> > >>>>>>> It enables systems to process and transport data efficiently,
> > >>>>> providing a
> > >>>>>>> simple and fast library for partitioning of large tables'?
> > >>>>>>>
> > >>>>>>> Sorry the delay, long election day
> > >>>>>>>
> > >>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
> > >>>>>> natebauernfe...@deephaven.io>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Suggestion: faster -> more efficiently
> > >>>>>>>>
> > >>>>>>>> "Apache Arrow is a cross-language development platform for
> > >> in-memory
> > >>>>>>>> data. It enables systems to process and transport data more
> > >>>>>> efficiently."
> > >>>>>>>>
> > >>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
> > wesmck...@gmail.com
> > >>>
> > >>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Here's what there now:
> > >>>>>>>>>
> > >>>>>>>>> "Apache Arrow is a cross-language development platform for
> > >>>>> in-memory
> > >>>>>>>>> data. It specifies a standardized language-independent columnar
> > >>>>>> memory
> > >>>>>>>>> format for flat and hierarchical data, organized for efficient
> > >>>>>>>>> analytic operations on modern hardware. It also provides
> > >>>>>> computational
> > >>>>>>>>> libraries and zero-copy streaming messaging and interprocess
> > >>>>>>>>> communication…"
> > >>>>>>>>>
> > >>>>>>>>> How about something shorter like
> > >>>>>>>>>
> > >>>>>>>>> "Apache Arrow is a cross-language development platform for
> > >>>>> in-memory
> > >>>>>>>>> data. It enables systems to process and transport data faster."
> > >>>>>>>>>
> > >>>>>>>>> Suggestions / refinements from others welcome
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <domor...@cmu.edu
> > >
> > >>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Super minor issue but could someone make the description on
> > >>>>> GitHub
> > >>>>>>>>> shorter?
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> GitHub puts the description into the title of the page and makes
> > >>>>> it
> > >>>>>>>> hard
> > >>>>>>>>> to find it in URL autocomplete.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> >
> >

Reply via email to