Hi,

It seems that there are no more opinions.

Weston, could you clean up the draft. Then we can start a vote.


Thanks,
--
kou

In <cae4ayb1__9ge5g4k_n7agtuitz0w7ur3vzht_yrsts9rzod...@mail.gmail.com>
  "Re: Please Review: Application for a Media Type" on Wed, 28 Apr 2021 
14:30:47 -1000,
  Weston Pace <weston.p...@gmail.com> wrote:

> +1 for .arrows from me.  I agree that .stream is too generic.
> 
> 
> On Thu, Apr 22, 2021 at 7:42 PM Sutou Kouhei <k...@clear-code.com> wrote:
>>
>> Hi,
>>
>> I feel that '.stream' is too generic. How about '.arrows'?
>> JSON Lines uses 'l' suffix for extension: '.jsonl'
>>
>> https://jsonlines.org/#conventions
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In <CAOYPqDBg0Y3W5s4S72brQ=cdn6a7g6e0bbccximxqulataz...@mail.gmail.com>
>>   "Re: Please Review: Application for a Media Type" on Thu, 22 Apr 2021 
>> 06:44:51 +0200,
>>   Jorge Cardoso Leitão <jorgecarlei...@gmail.com> wrote:
>>
>> > Thanks for driving this, exciting stuff!
>> >
>> > I went through it, left minor comments, it looks good to me.
>> >
>> > wrt to the extension: imo they should be different as the formats are not
>> > interchangeable.
>> >
>> > AFAIK `.stream` is not taken: it was used by Adobe shockwave but it was
>> > discontinued [1].
>> > So, .arrow and .stream may be sufficient.
>> >
>> > [1] https://helpx.adobe.com/shockwave/shockwave-end-of-life-faq.html
>> >
>> > Best,
>> > Jorge
>> >
>> >
>> > On Thu, Apr 22, 2021 at 3:35 AM Sutou Kouhei <k...@clear-code.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Thanks for updating the draft.
>> >>
>> >> I want to wait for at least a weak before we start a vote.
>> >> Does anyone have an opinion about file extension of Apache
>> >> Arrow format data? What do you think about ".arrow"?
>> >>
>> >>
>> >> Thanks,
>> >> --
>> >> kou
>> >>
>> >> In <cae4ayb1bo0fozh4oy5hkzhv5oq6s-bqywfdgd7cpjb1czct...@mail.gmail.com>
>> >>   "Re: Please Review: Application for a Media Type" on Wed, 21 Apr 2021
>> >> 08:17:40 -1000,
>> >>   Weston Pace <weston.p...@gmail.com> wrote:
>> >>
>> >> > Thank you for reviewing.  I have added your suggestions to the draft.
>> >> > Are we ready for a vote?  If so I will clean up the comments and send
>> >> > out a clean version of the draft.
>> >> >
>> >> > On Mon, Apr 19, 2021 at 3:10 PM Sutou Kouhei <k...@clear-code.com> 
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> Sorry for not responding this...
>> >> >>
>> >> >> Weston, thanks for writing up the draft!
>> >> >>
>> >> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>> >> >>
>> >> >> Here are items we need to discuss before we apply a media
>> >> >> type to IANA:
>> >> >>
>> >> >> 1. Interoperability Considerations
>> >> >>
>> >> >> Draft:
>> >> >>
>> >> >> > The Apache arrow format is intended to be a language
>> >> >> > independent columnar memory format for flat and
>> >> >> > hierarchical data.  It has been shown to work in a variety
>> >> >> > of languages and applications.  Arrow files can be
>> >> >> > provided in two different formats, a streaming format
>> >> >> > (vnd.apache.arrow.stream) and a random access format
>> >> >> > (vnd.apache.arrow.file).  Applications should be aware of
>> >> >> > which format they are processing as the two are not
>> >> >> > interchangeable.
>> >> >>
>> >> >> Note in draft:
>> >> >>
>> >> >> > Should we mention something like "applications should
>> >> >> > make sure to check the 'version' field to ensure they
>> >> >> > can process the file"?
>> >> >>
>> >> >> How about referring our format document for further
>> >> >> information instead of mention the 'version' field?
>> >> >> https://arrow.apache.org/docs/format/Columnar.html
>> >> >>
>> >> >> XML Media Types also refers the XML specification for
>> >> >> further information:
>> >> >>
>> >> >> https://tools.ietf.org/html/rfc7303#section-9.1
>> >> >>
>> >> >> > For further information, see Section 2.9 "Standalone
>> >> >> > Document Declaration" and Section 5 "Conformance" of [XML].
>> >> >>
>> >> >>
>> >> >> 2. File extension(s)
>> >> >>
>> >> >> Draft:
>> >> >>
>> >> >> > N/A
>> >> >>
>> >> >> Note in draft:
>> >> >>
>> >> >> > Again, there are no formal extensions that have been
>> >> >> > recommended before.  Do we want to introduce any?  I'm
>> >> >> > pretty sure this is in no way binding (and it's unlikely
>> >> >> > anyone will ever see it).
>> >> >>
>> >> >> I want recommended extensions to avoid spreading various
>> >> >> extensions for Apache Arrow formats.
>> >> >>
>> >> >> How about the followings?
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow
>> >> >>   * vnd.apache.arrow.stream: NA
>> >> >>     (Generally, this format isn't saved as file. This format
>> >> >>     is used for pipe, sending/receiving via socket and so on.)
>> >> >>
>> >> >> FYI: Here is a list that shows used extensions in our code
>> >> >> base.
>> >> >>
>> >> >> Our integration test uses the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow_file
>> >> >>   * vnd.apache.arrow.stream: .stream
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257
>> >> >>
>> >> >>     log('-- Validating file')
>> >> >>     producer_file_path = os.path.join(
>> >> >>         gold_dir, "generated_" + test_case.name + ".arrow_file")
>> >> >>     consumer.validate(json_path, producer_file_path)
>> >> >>
>> >> >>     log('-- Validating stream')
>> >> >>     consumer_stream_path = os.path.join(
>> >> >>         gold_dir, "generated_" + test_case.name + ".stream")
>> >> >>
>> >> >> Our C++ tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >> >>
>> >> >> Our C++ examples use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow
>> >> >>   * vnd.apache.arrow.stream: NA
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34
>> >> >>
>> >> >>     const char* arrow_filename = "test.arrow";
>> >> >>
>> >> >> Our Python documentation uses the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow
>> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst
>> >> >>
>> >> >>    with local.open_output_stream("test.arrow") as file:
>> >> >>
>> >> >> Our Go tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: Not used (no extension)
>> >> >>   * vnd.apache.arrow.stream: Not used (no extension)
>> >> >>
>> >> >> Our Java tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow
>> >> >>   * vnd.apache.arrow.stream: .arrow but most of tests use in-memory
>> >> buffer
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51
>> >> >>
>> >> >>     File file = new File("target/mytest_write.arrow");
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176
>> >> >>
>> >> >>     final File temp = File.createTempFile("arrow-test-" + name + "-",
>> >> ".arrow");
>> >> >>
>> >> >> Our JavaScript tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >> >>
>> >> >> Our Julia tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >> >>
>> >> >> Our Rust tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow_file
>> >> >>   * vnd.apache.arrow.stream: .stream
>> >> >>
>> >> >> Note that they use data in our integration test.
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> --
>> >> >> kou
>> >> >>
>> >> >> In <cajpuwmckzuppmol-o0+d6fjwk-eas2teyf_pw0qzthhvx-9...@mail.gmail.com>
>> >> >>   "Re: Please Review: Application for a Media Type" on Fri, 22 Jan 2021
>> >> 14:37:35 -0600,
>> >> >>   Wes McKinney <wesmck...@gmail.com> wrote:
>> >> >>
>> >> >> > Thank you for taking the lead on this. I gave a brief read through 
>> >> >> > and
>> >> >> > I think it makes sense using Thrift or Protocol Buffers as a
>> >> >> > guideline. Would be good for some others to review who might be
>> >> >> > familiar with IANA media formats
>> >> >> >
>> >> >> > On Wed, Jan 20, 2021 at 6:17 PM Weston Pace <weston.p...@gmail.com>
>> >> wrote:
>> >> >> >>
>> >> >> >> Per a previous discussion
>> >> >> >> (
>> >> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
>> >> )
>> >> >> >> and the resulting JIRA issue ARROW-7396
>> >> >> >> (https://issues.apache.org/jira/browse/ARROW-7396) there is a desire
>> >> >> >> to register the arrow format with the IANA as a formal media type
>> >> >> >> (actually two media types, one for the streaming format and one for
>> >> >> >> the file format).
>> >> >> >>
>> >> >> >> The form for applying is here: https://www.iana.org/form/media-types
>> >> >> >>
>> >> >> >> I have created a draft registration document (link below).
>> >> >> >>
>> >> >> >> The only fields with any real flexibility are "Security
>> >> >> >> Considerations", "Interoperability Considerations", and "Application
>> >> >> >> Usage".  I reviewed the applications for XML, JSON, and Thrift and
>> >> >> >> I've made a best attempt at these fields as well as posted examples
>> >> >> >> from the other languages.  Please review and feel free to suggest
>> >> >> >> changes.
>> >> >> >>
>> >> >> >>
>> >> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>> >> >> >>
>> >> >> >> One we align on the content we should probably have a PMC member
>> >> >> >> actually make the submission and be listed as contact person.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >>
>> >> >> >> Weston Pace
>> >> >> >> Ursa Computing
>> >>

Reply via email to