Benjamin Smith created SAMZA-968:
------------------------------------

             Summary: SequenceFileHdfsFileWriter does not close file properly
                 Key: SAMZA-968
                 URL: https://issues.apache.org/jira/browse/SAMZA-968
             Project: Samza
          Issue Type: Bug
          Components: container
    Affects Versions: 0.10.0, 0.10.1
            Reporter: Benjamin Smith
            Priority: Minor


>From [email protected]:

Hi, Benjamin,

Thanks a lot for reporting this! It makes sense from reading the posts.
Could you open a JIRA? Are you interested in assigning to yourself and
contribute the fix?

Thanks a lot again!

-Yi

> Hello,
>
> I am working on a project where we are integrating Samza and Hive. As part
> of this project, we ran into an issue where sequence files written from
> Samza were taking a long time (hours) to completely sync with HDFS.
>
> After some Googling and digging into the code, it appears that the issue
> is here:
>
> https://github.com/apache/samza/blob/master/samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala#L111
>
> Writer.stream(dfs.create(path)) implies that the caller of
> dfs.create(path) is responsible for closing the created stream explicitly.
> This doesn't happen, and the SequenceFileHdfsWriter call to close will only
> flush the stream.
>
> I believe the correct line should be:
>
> Writer.file(path)
>
> Or, SequenceFileHdfsWriter should explicitly track and close the stream.
>
> Thanks!
>
> Ben
>
> Refernece material:
>
> http://stackoverflow.com/questions/27916872/why-the-sequencefile-is-truncated
>
> https://apache.googlesource.com/hadoop-common/+/HADOOP-6685/src/java/org/apache/hadoop/io/SequenceFile.java#1238



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to