Chang Zong created FLUME-3079:
---------------------------------

             Summary: HDFS sink using snappy compression cannot process .tmp 
file correctly when data is writing in
                 Key: FLUME-3079
                 URL: https://issues.apache.org/jira/browse/FLUME-3079
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: 1.5.2
            Reporter: Chang Zong


I'm using HDFS sink with Snappy compression codec. When JSON events is writing 
into HDFS, there is a .snappy.tmp file generated. If I want to access data in 
that tmp file with hive, there would be a JSON parsing error.

I think the reason is HDFS sink already put some Snappy format content into the 
tmp file, but as the file is not finished, writing Snappy format is not 
completed yet, which cannot be recognised by Hive JSON Serde. After the file is 
rolled up to a normal Snappy file, it can be processed corrected.

So is there a way to keep text format while writing data into the tmp file, and 
convert it to Snappy format after the tmp file is rolled up? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to