[ https://issues.apache.org/jira/browse/FLUME-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943005#comment-15943005 ]
darkz edited comment on FLUME-1702 at 3/27/17 11:16 AM: -------------------------------------------------------- I select the .tmp data in hive,then it cauth a error: Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code 1)): only regular white space (\r, \n, \t) is allowed between tokens at [Source: java.io.ByteArrayInputStream@7730ef88; line: 1, column: 2] I think is the compressed file with '.tmp' suffix is in use and is not a completed compressed file,so codec in hadoop colud not recognize the content of it After all:Yes,I use the "." prefix to skip ".tmp" file,but the flume docuent dos not mention it... was (Author: darkz): Yes,I use the "." prefix to skip ".tmp" file,but the flume document dos not mention it... > HDFSEventSink should write to a hidden file as opposed to a .tmp file > --------------------------------------------------------------------- > > Key: FLUME-1702 > URL: https://issues.apache.org/jira/browse/FLUME-1702 > Project: Flume > Issue Type: Improvement > Reporter: Brock Noland > Assignee: Jarek Jarcec Cecho > Fix For: 1.4.0 > > Attachments: bugFLUME-1702.patch, bugFLUME-1702.patch > > > Currently we write to a .tmp file. The problem is that if MR jobs are being > run on the directory we are writing to, then it's common for an MR job to > list the directory, get a .tmp file and then in the mean time the .tmp file > is renamed causing the job to fail when run. > Using JavaMR you can use a PathFilter to avoid this, however a custom > solution is required for Pig, Hive, etc. > Perhaps we should write to a hidden file so that MR never tries to process > data in flight. -- This message was sent by Atlassian JIRA (v6.3.15#6346)