As for buffering data before making a call to write(): in Arrow 0.10.0
you'll be able to use BufferedOutputStream for this:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/buffered.h

Regards

Antoine.


Le 09/05/2018 à 17:39, Ambalu, Robert a écrit :
> I don’t have any offhand, no, but I would imagine that direct file writes 
> will at some point need to make a system call, which is expensive ( fwrite 
> might buffer before eventually making the sys call, looks like 
> FileOutputStream uses the raw system write for every write call).
> The current MMap io interface isn’t usable as a streaming output 
> unfortunately, though I suppose I could just implement my own
> 
> -----Original Message-----
> From: Antoine Pitrou [mailto:solip...@pitrou.net] 
> Sent: Wednesday, May 09, 2018 11:11 AM
> To: dev@arrow.apache.org
> Subject: Re: Question about streaming to memorymapped files
> 
> 
> Do you know of any benchmark numbers / performance studies about this?
> While it's true that a memory-mapped file avoids explicit system calls,
> I've heard file I/O is quite well optimized, at least on Linux,
> nowadays.
> 
> Regards
> 
> Antoine.
> 
> 
> On Wed, 9 May 2018 14:47:53 +0000
> "Ambalu, Robert" <robert.amb...@point72.com> wrote:
>> Antoine, thanks for the quick reply.
>> You can actually grow memorymapped files with a mremap call ( and I think a 
>> seek/write on the file ), I do this in my applications and it works fine.
>> I want the efficiency of writing via memory maps, so would prefer to avoid 
>> FileOutputStream
>>
>> -----Original Message-----
>> From: Antoine Pitrou [mailto:anto...@python.org] 
>> Sent: Wednesday, May 09, 2018 10:37 AM
>> To: dev@arrow.apache.org
>> Subject: Re: Question about streaming to memorymapped files
>>
>>
>> Hi,
>>
>> If you don't know the output size upfront then should probably use a
>> FileOutputStream instead.  By definition, memory mapped files must have
>> a fixed size (since they are mapped to a fixed area in virtual memory).
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 09/05/2018 à 16:31, Ambalu, Robert a écrit :
>>> Hey, I'm looking into streaming table updates into a memory mapped file ( 
>>> C++ )
>>> I think I have everything I need ( MemoryMappedFile output streamer, 
>>> RecordBatchStreamWriter ) but I don't understand how to properly create the 
>>> memmap file.  It looks like it requires you to preset a size to the file 
>>> when you create it, but since ill be streaming I don't actually know how 
>>> big a file im going to need...
>>> Am I missing some other API point here?  Any reason why size is required up 
>>> front and the memmap doesn't auto-grow as needed?
>>>
>>> Thanks in advance
>>> - Rob
>>>
>>>
>>>
>>>
>>>
>>> DISCLAIMER: This e-mail message and any attachments are intended solely for 
>>> the use of the individual or entity to which it is addressed and may 
>>> contain information that is confidential or legally privileged. If you are 
>>> not the intended recipient, you are hereby notified that any dissemination, 
>>> distribution, copying or other use of this message or its attachments is 
>>> strictly prohibited. If you have received this message in error, please 
>>> notify the sender immediately and permanently delete this message and any 
>>> attachments.
>>>
>>>
>>>
>>>   
> 

Reply via email to