Benjamin Smith created SAMZA-968:
------------------------------------
Summary: SequenceFileHdfsFileWriter does not close file properly
Key: SAMZA-968
URL: https://issues.apache.org/jira/browse/SAMZA-968
Project: Samza
Issue Type: Bug
Components: container
Affects Versions: 0.10.0, 0.10.1
Reporter: Benjamin Smith
Priority: Minor
>From [email protected]:
Hi, Benjamin,
Thanks a lot for reporting this! It makes sense from reading the posts.
Could you open a JIRA? Are you interested in assigning to yourself and
contribute the fix?
Thanks a lot again!
-Yi
> Hello,
>
> I am working on a project where we are integrating Samza and Hive. As part
> of this project, we ran into an issue where sequence files written from
> Samza were taking a long time (hours) to completely sync with HDFS.
>
> After some Googling and digging into the code, it appears that the issue
> is here:
>
> https://github.com/apache/samza/blob/master/samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala#L111
>
> Writer.stream(dfs.create(path)) implies that the caller of
> dfs.create(path) is responsible for closing the created stream explicitly.
> This doesn't happen, and the SequenceFileHdfsWriter call to close will only
> flush the stream.
>
> I believe the correct line should be:
>
> Writer.file(path)
>
> Or, SequenceFileHdfsWriter should explicitly track and close the stream.
>
> Thanks!
>
> Ben
>
> Refernece material:
>
> http://stackoverflow.com/questions/27916872/why-the-sequencefile-is-truncated
>
> https://apache.googlesource.com/hadoop-common/+/HADOOP-6685/src/java/org/apache/hadoop/io/SequenceFile.java#1238
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)