As for buffering data before making a call to write(): in Arrow 0.10.0 you'll be able to use BufferedOutputStream for this: https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/buffered.h
Regards Antoine. Le 09/05/2018 à 17:39, Ambalu, Robert a écrit : > I don’t have any offhand, no, but I would imagine that direct file writes > will at some point need to make a system call, which is expensive ( fwrite > might buffer before eventually making the sys call, looks like > FileOutputStream uses the raw system write for every write call). > The current MMap io interface isn’t usable as a streaming output > unfortunately, though I suppose I could just implement my own > > -----Original Message----- > From: Antoine Pitrou [mailto:solip...@pitrou.net] > Sent: Wednesday, May 09, 2018 11:11 AM > To: dev@arrow.apache.org > Subject: Re: Question about streaming to memorymapped files > > > Do you know of any benchmark numbers / performance studies about this? > While it's true that a memory-mapped file avoids explicit system calls, > I've heard file I/O is quite well optimized, at least on Linux, > nowadays. > > Regards > > Antoine. > > > On Wed, 9 May 2018 14:47:53 +0000 > "Ambalu, Robert" <robert.amb...@point72.com> wrote: >> Antoine, thanks for the quick reply. >> You can actually grow memorymapped files with a mremap call ( and I think a >> seek/write on the file ), I do this in my applications and it works fine. >> I want the efficiency of writing via memory maps, so would prefer to avoid >> FileOutputStream >> >> -----Original Message----- >> From: Antoine Pitrou [mailto:anto...@python.org] >> Sent: Wednesday, May 09, 2018 10:37 AM >> To: dev@arrow.apache.org >> Subject: Re: Question about streaming to memorymapped files >> >> >> Hi, >> >> If you don't know the output size upfront then should probably use a >> FileOutputStream instead. By definition, memory mapped files must have >> a fixed size (since they are mapped to a fixed area in virtual memory). >> >> Regards >> >> Antoine. >> >> >> Le 09/05/2018 à 16:31, Ambalu, Robert a écrit : >>> Hey, I'm looking into streaming table updates into a memory mapped file ( >>> C++ ) >>> I think I have everything I need ( MemoryMappedFile output streamer, >>> RecordBatchStreamWriter ) but I don't understand how to properly create the >>> memmap file. It looks like it requires you to preset a size to the file >>> when you create it, but since ill be streaming I don't actually know how >>> big a file im going to need... >>> Am I missing some other API point here? Any reason why size is required up >>> front and the memmap doesn't auto-grow as needed? >>> >>> Thanks in advance >>> - Rob >>> >>> >>> >>> >>> >>> DISCLAIMER: This e-mail message and any attachments are intended solely for >>> the use of the individual or entity to which it is addressed and may >>> contain information that is confidential or legally privileged. If you are >>> not the intended recipient, you are hereby notified that any dissemination, >>> distribution, copying or other use of this message or its attachments is >>> strictly prohibited. If you have received this message in error, please >>> notify the sender immediately and permanently delete this message and any >>> attachments. >>> >>> >>> >>> >