[
https://issues.apache.org/jira/browse/AVRO-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254626#comment-17254626
]
He Chen commented on AVRO-3013:
-------------------------------
I agree that the third option is the easiest. There are still some benefits for
avro to do it though. For one thing, it makes the library more consistent. The
problem and solution are the same in Java and Python.
In Java, you basically need to
{code:java}
getFD().sync(){code}
In Python, it becomes
{code:java}
os.fsync(fd.fileno){code}
Another benefit is that, it makes it clear that flush does not guarantee the
data is written in physical device. It would reduce the possibility of
introducing this hard-to-find bug for user (like me).
If it really conflicts with Avro's domain and we don't have to keep the
library's consistency here, I guess a more detailed description or doc string
for the flush function DataFileWriter is also sufficient. It should at least
state something like:
{code:java}
"""
flushing the stream may only guarantee that bytes previously written to the
stream are passed to the operating system for writing; it does not guarantee
that they are actually written to a physical device such as a disk drive.
"""{code}
> Avro files should allow fsync-ing files to disk in Python
> ---------------------------------------------------------
>
> Key: AVRO-3013
> URL: https://issues.apache.org/jira/browse/AVRO-3013
> Project: Apache Avro
> Issue Type: New Feature
> Components: python
> Reporter: He Chen
> Priority: Major
>
> I am new to Apache, but here I am...
> In our use case, we need to constantly update an existing avro file. The way
> we did it is that we copy the old avro file to a temporary file, append data
> to the temporary file, close the temporary file, and rename the temporary
> file to the original avro file. This is problematic since closing a file does
> not guarantee to write data to disk. The bug caused by this is hard to track
> since it's hard to reproduce.
> I noticed that there is a ticket that addresses this for the Java client
> https://issues.apache.org/jira/browse/AVRO-1388. Why isn't it implemented for
> the Python client? If there are no objections, I'd like to submit a patch. Or
> perhaps I am missing something here? Please let me know!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)