Sorry about the delay, I agree that .arrows is both more specific and informative.
Best, Jorge On Sat, May 1, 2021 at 1:11 AM Sutou Kouhei <k...@clear-code.com> wrote: > Hi, > > It seems that there are no more opinions. > > Weston, could you clean up the draft. Then we can start a vote. > > > Thanks, > -- > kou > > In <cae4ayb1__9ge5g4k_n7agtuitz0w7ur3vzht_yrsts9rzod...@mail.gmail.com> > "Re: Please Review: Application for a Media Type" on Wed, 28 Apr 2021 > 14:30:47 -1000, > Weston Pace <weston.p...@gmail.com> wrote: > > > +1 for .arrows from me. I agree that .stream is too generic. > > > > > > On Thu, Apr 22, 2021 at 7:42 PM Sutou Kouhei <k...@clear-code.com> wrote: > >> > >> Hi, > >> > >> I feel that '.stream' is too generic. How about '.arrows'? > >> JSON Lines uses 'l' suffix for extension: '.jsonl' > >> > >> https://jsonlines.org/#conventions > >> > >> > >> Thanks, > >> -- > >> kou > >> > >> In <CAOYPqDBg0Y3W5s4S72brQ=cdn6a7g6e0bbccximxqulataz...@mail.gmail.com> > >> "Re: Please Review: Application for a Media Type" on Thu, 22 Apr 2021 > 06:44:51 +0200, > >> Jorge Cardoso Leitão <jorgecarlei...@gmail.com> wrote: > >> > >> > Thanks for driving this, exciting stuff! > >> > > >> > I went through it, left minor comments, it looks good to me. > >> > > >> > wrt to the extension: imo they should be different as the formats are > not > >> > interchangeable. > >> > > >> > AFAIK `.stream` is not taken: it was used by Adobe shockwave but it > was > >> > discontinued [1]. > >> > So, .arrow and .stream may be sufficient. > >> > > >> > [1] https://helpx.adobe.com/shockwave/shockwave-end-of-life-faq.html > >> > > >> > Best, > >> > Jorge > >> > > >> > > >> > On Thu, Apr 22, 2021 at 3:35 AM Sutou Kouhei <k...@clear-code.com> > wrote: > >> > > >> >> Hi, > >> >> > >> >> Thanks for updating the draft. > >> >> > >> >> I want to wait for at least a weak before we start a vote. > >> >> Does anyone have an opinion about file extension of Apache > >> >> Arrow format data? What do you think about ".arrow"? > >> >> > >> >> > >> >> Thanks, > >> >> -- > >> >> kou > >> >> > >> >> In < > cae4ayb1bo0fozh4oy5hkzhv5oq6s-bqywfdgd7cpjb1czct...@mail.gmail.com> > >> >> "Re: Please Review: Application for a Media Type" on Wed, 21 Apr > 2021 > >> >> 08:17:40 -1000, > >> >> Weston Pace <weston.p...@gmail.com> wrote: > >> >> > >> >> > Thank you for reviewing. I have added your suggestions to the > draft. > >> >> > Are we ready for a vote? If so I will clean up the comments and > send > >> >> > out a clean version of the draft. > >> >> > > >> >> > On Mon, Apr 19, 2021 at 3:10 PM Sutou Kouhei <k...@clear-code.com> > wrote: > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> Sorry for not responding this... > >> >> >> > >> >> >> Weston, thanks for writing up the draft! > >> >> >> > >> >> > https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing > >> >> >> > >> >> >> Here are items we need to discuss before we apply a media > >> >> >> type to IANA: > >> >> >> > >> >> >> 1. Interoperability Considerations > >> >> >> > >> >> >> Draft: > >> >> >> > >> >> >> > The Apache arrow format is intended to be a language > >> >> >> > independent columnar memory format for flat and > >> >> >> > hierarchical data. It has been shown to work in a variety > >> >> >> > of languages and applications. Arrow files can be > >> >> >> > provided in two different formats, a streaming format > >> >> >> > (vnd.apache.arrow.stream) and a random access format > >> >> >> > (vnd.apache.arrow.file). Applications should be aware of > >> >> >> > which format they are processing as the two are not > >> >> >> > interchangeable. > >> >> >> > >> >> >> Note in draft: > >> >> >> > >> >> >> > Should we mention something like "applications should > >> >> >> > make sure to check the 'version' field to ensure they > >> >> >> > can process the file"? > >> >> >> > >> >> >> How about referring our format document for further > >> >> >> information instead of mention the 'version' field? > >> >> >> https://arrow.apache.org/docs/format/Columnar.html > >> >> >> > >> >> >> XML Media Types also refers the XML specification for > >> >> >> further information: > >> >> >> > >> >> >> https://tools.ietf.org/html/rfc7303#section-9.1 > >> >> >> > >> >> >> > For further information, see Section 2.9 "Standalone > >> >> >> > Document Declaration" and Section 5 "Conformance" of [XML]. > >> >> >> > >> >> >> > >> >> >> 2. File extension(s) > >> >> >> > >> >> >> Draft: > >> >> >> > >> >> >> > N/A > >> >> >> > >> >> >> Note in draft: > >> >> >> > >> >> >> > Again, there are no formal extensions that have been > >> >> >> > recommended before. Do we want to introduce any? I'm > >> >> >> > pretty sure this is in no way binding (and it's unlikely > >> >> >> > anyone will ever see it). > >> >> >> > >> >> >> I want recommended extensions to avoid spreading various > >> >> >> extensions for Apache Arrow formats. > >> >> >> > >> >> >> How about the followings? > >> >> >> > >> >> >> * vnd.apache.arrow.file: .arrow > >> >> >> * vnd.apache.arrow.stream: NA > >> >> >> (Generally, this format isn't saved as file. This format > >> >> >> is used for pipe, sending/receiving via socket and so on.) > >> >> >> > >> >> >> FYI: Here is a list that shows used extensions in our code > >> >> >> base. > >> >> >> > >> >> >> Our integration test uses the following extensions: > >> >> >> > >> >> >> * vnd.apache.arrow.file: .arrow_file > >> >> >> * vnd.apache.arrow.stream: .stream > >> >> >> > >> >> >> > >> >> > https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257 > >> >> >> > >> >> >> log('-- Validating file') > >> >> >> producer_file_path = os.path.join( > >> >> >> gold_dir, "generated_" + test_case.name + ".arrow_file") > >> >> >> consumer.validate(json_path, producer_file_path) > >> >> >> > >> >> >> log('-- Validating stream') > >> >> >> consumer_stream_path = os.path.join( > >> >> >> gold_dir, "generated_" + test_case.name + ".stream") > >> >> >> > >> >> >> Our C++ tests use the following extensions: > >> >> >> > >> >> >> * vnd.apache.arrow.file: Not used (in-memory buffer is used) > >> >> >> * vnd.apache.arrow.stream: Not used (in-memory buffer is used) > >> >> >> > >> >> >> Our C++ examples use the following extensions: > >> >> >> > >> >> >> * vnd.apache.arrow.file: .arrow > >> >> >> * vnd.apache.arrow.stream: NA > >> >> >> > >> >> >> > >> >> > https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34 > >> >> >> > >> >> >> const char* arrow_filename = "test.arrow"; > >> >> >> > >> >> >> Our Python documentation uses the following extensions: > >> >> >> > >> >> >> * vnd.apache.arrow.file: .arrow > >> >> >> * vnd.apache.arrow.stream: Not used (in-memory buffer is used) > >> >> >> > >> >> >> > >> >> > https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst > >> >> >> > >> >> >> with local.open_output_stream("test.arrow") as file: > >> >> >> > >> >> >> Our Go tests use the following extensions: > >> >> >> > >> >> >> * vnd.apache.arrow.file: Not used (no extension) > >> >> >> * vnd.apache.arrow.stream: Not used (no extension) > >> >> >> > >> >> >> Our Java tests use the following extensions: > >> >> >> > >> >> >> * vnd.apache.arrow.file: .arrow > >> >> >> * vnd.apache.arrow.stream: .arrow but most of tests use > in-memory > >> >> buffer > >> >> >> > >> >> >> > >> >> > https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51 > >> >> >> > >> >> >> File file = new File("target/mytest_write.arrow"); > >> >> >> > >> >> >> > >> >> > https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176 > >> >> >> > >> >> >> final File temp = File.createTempFile("arrow-test-" + name + > "-", > >> >> ".arrow"); > >> >> >> > >> >> >> Our JavaScript tests use the following extensions: > >> >> >> > >> >> >> * vnd.apache.arrow.file: Not used (in-memory buffer is used) > >> >> >> * vnd.apache.arrow.stream: Not used (in-memory buffer is used) > >> >> >> > >> >> >> Our Julia tests use the following extensions: > >> >> >> > >> >> >> * vnd.apache.arrow.file: Not used (in-memory buffer is used) > >> >> >> * vnd.apache.arrow.stream: Not used (in-memory buffer is used) > >> >> >> > >> >> >> Our Rust tests use the following extensions: > >> >> >> > >> >> >> * vnd.apache.arrow.file: .arrow_file > >> >> >> * vnd.apache.arrow.stream: .stream > >> >> >> > >> >> >> Note that they use data in our integration test. > >> >> >> > >> >> >> > >> >> >> Thanks, > >> >> >> -- > >> >> >> kou > >> >> >> > >> >> >> In < > cajpuwmckzuppmol-o0+d6fjwk-eas2teyf_pw0qzthhvx-9...@mail.gmail.com> > >> >> >> "Re: Please Review: Application for a Media Type" on Fri, 22 > Jan 2021 > >> >> 14:37:35 -0600, > >> >> >> Wes McKinney <wesmck...@gmail.com> wrote: > >> >> >> > >> >> >> > Thank you for taking the lead on this. I gave a brief read > through and > >> >> >> > I think it makes sense using Thrift or Protocol Buffers as a > >> >> >> > guideline. Would be good for some others to review who might be > >> >> >> > familiar with IANA media formats > >> >> >> > > >> >> >> > On Wed, Jan 20, 2021 at 6:17 PM Weston Pace < > weston.p...@gmail.com> > >> >> wrote: > >> >> >> >> > >> >> >> >> Per a previous discussion > >> >> >> >> ( > >> >> > https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E > >> >> ) > >> >> >> >> and the resulting JIRA issue ARROW-7396 > >> >> >> >> (https://issues.apache.org/jira/browse/ARROW-7396) there is a > desire > >> >> >> >> to register the arrow format with the IANA as a formal media > type > >> >> >> >> (actually two media types, one for the streaming format and > one for > >> >> >> >> the file format). > >> >> >> >> > >> >> >> >> The form for applying is here: > https://www.iana.org/form/media-types > >> >> >> >> > >> >> >> >> I have created a draft registration document (link below). > >> >> >> >> > >> >> >> >> The only fields with any real flexibility are "Security > >> >> >> >> Considerations", "Interoperability Considerations", and > "Application > >> >> >> >> Usage". I reviewed the applications for XML, JSON, and Thrift > and > >> >> >> >> I've made a best attempt at these fields as well as posted > examples > >> >> >> >> from the other languages. Please review and feel free to > suggest > >> >> >> >> changes. > >> >> >> >> > >> >> >> >> > >> >> > https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing > >> >> >> >> > >> >> >> >> One we align on the content we should probably have a PMC > member > >> >> >> >> actually make the submission and be listed as contact person. > >> >> >> >> > >> >> >> >> Thanks, > >> >> >> >> > >> >> >> >> Weston Pace > >> >> >> >> Ursa Computing > >> >> >