Personally, I like the "Feather" name (and actually think it could help
disambiguate the file vs in-memory distinction), but I understand that we
have chosen a certain path (eg ".arrow" is the official registered
extension), and have to move on.

However, I think we need to be very careful in how we brand the
alternative, and think proactively about what terminology we want to be
used (and which terms to use in APIs, ..). Because I think the "IPC" aspect
of the naming can also become confusing (IPC is a generic term, does not
clearly indicate it is a *file* format, and also not that it is related to
*arrow*).

As an example, I just noticed a twitter thread (
https://twitter.com/braaannigan/status/1566715704937676800) that is
promoting the "IPC format". The specific library used here (polars) also
exposes this as a "read_ipc" function.
Other examples:

- In pyarrow, we have a `feather` submodule with read/write_feather
functions. How do we want to replace this? The current alternative is the
pyarrow.ipc submodule (which has functionality to open files), but so this
is using the "IPC" terminology. Are we OK with making this the alternative,
or do we want to add new APIs?
- In pyarrow.dataset, we also use IpcFileFormat for Arrow files. Should we
rename this to `ArrowFileFormat`? (and keep IpcFileFormat as alias)
- In the R arrow package, the non-feather alternative for `read_feather`
currently is `read_ipc_file`
- In pandas, there is read_feather/to_feather. What do we think that pandas
should use instead?
- ...

Personally, I think we should certainly avoid names that just use IPC (like
`read_ipc`). An alternative could be `read_arrow_ipc`, but if want to drop
the IPC part (as proposed earlier in this thread, although not yet agreed
on), that would become `read_arrow`/`to_arrow`. That might then be confused
with reading from / converting to in-memory arrow data or stream?
If we want to recommend using "Arrow file" terminology, so then APIs like
`read_arrow_file` could be used?

If we want to move the (mostly Python and R) ecosystem away from "Feather",
I think we should have a clear recommendation of what to use instead.

On Wed, 31 Aug 2022 at 20:33, Aldrin <akmon...@ucsc.edu.invalid> wrote:

> similarly to Micah, I mentally think of "Arrow IPC" a format that is
> optimized for "IPC".
> Which I have assumed meant it minimizes CPU overhead when using data read
> from
> storage because it's already in a memory friendly format (e.g. minimal
> deserialization).
>
> Not sure the "IPC" is necessary, but it does push the intent into the name
> (unless it's
> actually a misnomer).
>
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>
>
> On Tue, Aug 30, 2022 at 8:29 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > I think one source of ambiguity for Arrow files, at least for me, is
> > whether they are just a string of messages concatenated or they are the
> > files that contain the metadata footer.
> >
> > On Tue, Aug 30, 2022 at 5:11 AM Dewey Dunnington
> > <de...@voltrondata.com.invalid> wrote:
> >
> > > Ian has a very good point...I would be in favour of calling them "Arrow
> > > files" wherever possible since there's no need to know or care what
> > > interprocess communication is to use them!
> > >
> > > On Mon, Aug 29, 2022 at 6:50 PM Ian Cook <i...@ursacomputing.com>
> wrote:
> > >
> > > > +1 We should explicitly discourage further use of “Feather” to refer
> to
> > > > Arrow IPC files.
> > > >
> > > > In this spirit of simplifying terminology: Does the “IPC” in the term
> > > > “Arrow IPC files” serve a truly necessary purpose? Is there another
> > type
> > > of
> > > > “Arrow file” that the “IPC” serves to disambiguate? If not, can we
> > simply
> > > > refer to these files as “Arrow files” in most places in the
> > documentation
> > > > and website? (In a few important places we should clarify that when
> we
> > > say
> > > > “Arrow file” we are referring to a file that uses the Arrow IPC file
> > > > format.)
> > > >
> > > > Ian
> > > >
> > > > On Mon, Aug 29, 2022 at 17:33 Sutou Kouhei <k...@clear-code.com>
> wrote:
> > > >
> > > > > +1 for 1.
> > > > >
> > > > > Thanks,
> > > > > --
> > > > > kou
> > > > >
> > > > > In <CAOYPqDCAib2wBKaKnRij9=__OsUJJghVq1UUTNibK2T0Np+=
> > r...@mail.gmail.com
> > > >
> > > > >   "Re: Usage of the name Feather?" on Mon, 29 Aug 2022 20:18:37
> > +0200,
> > > > >   Jorge Cardoso Leitão <jorgecarlei...@gmail.com> wrote:
> > > > >
> > > > > > I agree.
> > > > > >
> > > > > > I suspect that the most widely used API with "feather" is Pandas'
> > > > > > read_feather.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, 29 Aug 2022, 19:55 Weston Pace, <weston.p...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> I agree as well.  I think most lingering uses of the term
> > "feather"
> > > > > >> are in pyarrow and R however, so it might be good to hear from
> > some
> > > of
> > > > > >> those maintainers.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Aug 29, 2022 at 9:35 AM Antoine Pitrou <
> > anto...@python.org>
> > > > > wrote:
> > > > > >> >
> > > > > >> >
> > > > > >> > I agree with this as well.
> > > > > >> >
> > > > > >> > Regards
> > > > > >> >
> > > > > >> > Antoine.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Mon, 29 Aug 2022 11:29:45 -0400
> > > > > >> > Andrew Lamb <al...@influxdata.com> wrote:
> > > > > >> > > In the rust implementation we use the term "Arrow IPC" and I
> > > > support
> > > > > >> your
> > > > > >> > > option 1:
> > > > > >> > >
> > > > > >> > > > The name Feather V2 is deprecated. Only the extension
> > ".arrow"
> > > > > will
> > > > > >> be
> > > > > >> > > used for IPC files.
> > > > > >> > >
> > > > > >> > > Andrew
> > > > > >> > >
> > > > > >> > > On Mon, Aug 29, 2022 at 11:21 AM Matthew Topol
> > > > > >> <m...@voltrondata.com.invalid>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > When I wrote "In-Memory Analytics with Apache Arrow" I
> > > > definitely
> > > > > >> > > > treated "Feather" as deprecated and mentioned it only in
> > > passing
> > > > > >> > > > specifically indicating "Arrow IPC" as the terminology to
> > > use. I
> > > > > only
> > > > > >> > > > even mentioned "Feather" at all because there are still
> > > methods
> > > > in
> > > > > >> > > > pyarrow that reference it by name.
> > > > > >> > > >
> > > > > >> > > > That's just my opinion though...
> > > > > >> > > >
> > > > > >> > > > On Mon, Aug 29 2022 at 11:08:53 AM -0400, David Li
> > > > > >> > > > <lidav...@apache.org> wrote:
> > > > > >> > > > > This has come up before, e.g. see [1] [2] [3].
> > > > > >> > > > >
> > > > > >> > > > > I would say "Feather" is effectively deprecated and we
> are
> > > > using
> > > > > >> > > > > "Arrow IPC" now but I am not sure what others think.
> (From
> > > > that
> > > > > >> > > > > GitHub link, it seems to be mixed.) And ".arrow" is the
> > > > official
> > > > > >> > > > > extension now (since it is registered as part of our
> MIME
> > > > type).
> > > > > >> But
> > > > > >> > > > > there's existing documentation and not everything has
> been
> > > > > updated
> > > > > >> to
> > > > > >> > > > > be consistent (as you saw).
> > > > > >> > > > >
> > > > > >> > > > > [1]:
> > > > > >> > > > > <
> > > > > https://lists.apache.org/thread/0s6lgvd3g56ymd60vl5lgzhf4ro6hts5>
> > > > > >> > > > > [2]:
> > > > > >> > > > > <
> > > > > https://arrow.apache.org/faq/#what-about-the-feather-file-format>
> > > > > >> > > > > [3]:
> > > > > >> > > > > <
> > > > > >> > > >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://stackoverflow.com/questions/67910612/arrow-ipc-vs-feather/67911190#67911190
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > -David
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Aug 29, 2022, at 10:50, 島 達也 wrote:
> > > > > >> > > > >>  Hi all.
> > > > > >> > > > >>
> > > > > >> > > > >>  I know the documentation (mainly pyarrow
> documentation)
> > > > > sometimes
> > > > > >> > > > >> refers
> > > > > >> > > > >>  to IPC files as Feather files, but are there any
> > > guidelines
> > > > > for
> > > > > >> > > > >> when to
> > > > > >> > > > >>  refer to an IPC file as a Feather file and when to
> refer
> > > to
> > > > > it as
> > > > > >> > > > >> an IPC
> > > > > >> > > > >>  file?
> > > > > >> > > > >>  I believe that calling the same file an Arrow IPC file
> > at
> > > > > times
> > > > > >> and
> > > > > >> > > > >> a
> > > > > >> > > > >>  Feather file at other times is confusing to those
> > > unfamiliar
> > > > > with
> > > > > >> > > > >> Apache
> > > > > >> > > > >>  Arrow (myself included).
> > > > > >> > > > >>  Surprisingly, these files may even have completely
> > > different
> > > > > >> > > > >> extensions,
> > > > > >> > > > >>  ".arrow" and ".feather", which are not similar.
> > > > > >> > > > >>
> > > > > >> > > > >>  Perhaps there are several options for future use of
> the
> > > name
> > > > > >> > > > >> Feather,
> > > > > >> > > > >>  such as
> > > > > >> > > > >>
> > > > > >> > > > >>   1. The name Feather V2 is deprecated. Only the
> > extension
> > > > > >> ".arrow"
> > > > > >> > > > >> will
> > > > > >> > > > >>      be used for IPC files.
> > > > > >> > > > >>   2. In some contexts(?), IPC files are referred to as
> > > > Feather;
> > > > > >> only
> > > > > >> > > > >>      ".arrow" is used for the IPC file extension to
> > clearly
> > > > > >> > > > >> distinguish
> > > > > >> > > > >>      it from Feather V1's ".feather".
> > > > > >> > > > >>   3. When an IPC file is called Feather by some rule,
> > > > extension
> > > > > >> > > > >>      ".feather" is used, and when an IPC file is not
> > called
> > > > > >> Feather,
> > > > > >> > > > >>      extension ".arrow" is used.
> > > > > >> > > > >>
> > > > > >> > > > >>  I mistakenly thought the current status was 2, but
> > > according
> > > > > to
> > > > > >> the
> > > > > >> > > > >>  discussion in this PR
> > > > > >> > > > >> (<https://github.com/apache/arrow/pull/13677>),
> > > > > >> > > > >>  apparently the current status seems 3. (However, there
> > > seems
> > > > > to
> > > > > >> be
> > > > > >> > > > >> no
> > > > > >> > > > >>  rule as to when an IPC file should be called a
> Feather)
> > > > > >> > > > >>
> > > > > >> > > > >>  I am not very familiar with Arrow and this is my first
> > > post
> > > > to
> > > > > >> this
> > > > > >> > > > >>  mailing list so I apologize if I have done something
> > wrong
> > > > or
> > > > > >> > > > >> inappropriate.
> > > > > >> > > > >>
> > > > > >> > > > >>  Best,
> > > > > >> > > > >>  SHIMA Tatsuya
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to