[
https://issues.apache.org/jira/browse/CAMEL-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242515#comment-14242515
]
Willem Jiang commented on CAMEL-8040:
-------------------------------------
Hi Josef,
I just checked your camel route, your file endpoint is using the default
setting of fileExist which is override the file. That could explain why camel
keeps write new chunck to the same file.
You can get it work by changing the route just like this:
{code}
<camel:route id="toFile" autoStartup="true">
<from uri="hdfs2:localhost:8020/tmp/camel-test/"/>
<log message="picked up file from hdfs with name
$simple{header.CamelFileName}"/>
<to uri="file:test-dest?fileExist=Append"/>
<log message="file downloaded from hadoop"/>
</camel:route>
{code}
Please check out this [page|https://camel.apache.org/file2] for more
information.
> camel-hdfs2 consumer overwriting data instead of appending them
> ---------------------------------------------------------------
>
> Key: CAMEL-8040
> URL: https://issues.apache.org/jira/browse/CAMEL-8040
> Project: Camel
> Issue Type: Bug
> Components: camel-hdfs
> Affects Versions: 2.13.0, 2.14.0
> Reporter: Josef Ludvíček
> Assignee: Willem Jiang
> Attachments: hdfs-reproducer.zip
>
>
> h1. camel-hdfs2 consumer overwriting data instead of appending them
> There is probably bug in camel hdfs2 consumer.
> In this project are two camel routes, one taking files from `test-source` and
> uploading them to hadoop hdfs,
> another route watching folder in hadoop hdfs and downloading them to
> `test-dest` folder in this project.
> It seems, that when downloading file from hdfs to local filesystem, it keeps
> writing chunks of data to begining of target file in test-source, instead of
> simply appending chunks, as I would expect.
> From camel log i suppose, that each chunk of data from hadoop file is treated
> it was whole file.
> Ruby script `generate_textfile.rb` can generate file `test.txt` with content
> {code}
> 0 - line
> 1 - line
> 2 - line
> 3 - line
> 4 - line
> 5 - line
> ...
> ...
> 99999 - line
> {code}
> h2. Scenario
> - _expecting running hadoop instance on localhost:8020_
> - run mvn camel:run
> - copy test.txt into test-source
> - see log and file test.txt in test-dest
> - rest.txt in test-dest folder should contain only last x lines of original
> one.
>
>
> Camel log
> {code}
> [localhost:8020/tmp/camel-test/] toFile INFO picked up
> file from hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile INFO file
> downloaded from hadoop
> [localhost:8020/tmp/camel-test/] toFile INFO picked up
> file from hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile INFO file
> downloaded from hadoop
> [localhost:8020/tmp/camel-test/] toFile INFO picked up
> file from hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile INFO file
> downloaded from hadoop
> [localhost:8020/tmp/camel-test/] toFile INFO picked up
> file from hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile INFO file
> downloaded from hadoop
> {code}
>
> h2. Envoriment
> * camel 2.14 and 2.13
> * hadoop VirtualBox VM
> * * downloaded from
> http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-2-x.html
> * * tested with version 2.3.0-cdh5.1.0,
> r8e266e052e423af592871e2dfe09d54c03f6a0e8 which I couldn't find on download
> page
> * hadoop docker image
> * * https://github.com/sequenceiq/hadoop-docker
> * * results were the same as with virtualbox vm
> In case ov VirtualBox VM, by default it binds hdfs to
> `hdfs://quickstart.cloudera:8020` and it needs to be changed in
> `/etc/hadoop/conf/core-site.xml`. It should work when `fs.defaultFS` is set
> to `hdfs://0.0.0.0:8020`.
> In case of docker hadoop image, first start docker container, figure out its
> ip address, and use it for camel hdfs component.
> Here camel uri would be `hdfs:172.17.0.2:9000/tmp/camel-test`.
> {code}
> docker run -i -t sequenceiq/hadoop-docker:2.5.1 /etc/bootstrap.sh -bash
> Starting sshd: [ OK ]
> Starting namenodes on [966476255fc2]
> 966476255fc2: starting namenode, logging to
> /usr/local/hadoop/logs/hadoop-root-namenode-966476255fc2.out
> localhost: starting datanode, logging to
> /usr/local/hadoop/logs/hadoop-root-datanode-966476255fc2.out
> Starting secondary namenodes [0.0.0.0]
> 0.0.0.0: starting secondarynamenode, logging to
> /usr/local/hadoop/logs/hadoop-root-secondarynamenode-966476255fc2.out
> starting yarn daemons
> starting resourcemanager, logging to
> /usr/local/hadoop/logs/yarn--resourcemanager-966476255fc2.out
> localhost: starting nodemanager, logging to
> /usr/local/hadoop/logs/yarn-root-nodemanager-966476255fc2.out
> {code}
> see to which IP hdfs filesystem api is bound to inside docker container
> {code}
> bash-4.1# netstat -tulnp
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address Foreign Address
> State PID/Program name
> ...
> tcp 0 0 172.17.0.2:9000 0.0.0.0:*
> LISTEN -
> ...
> {code}
> There might be Exception because of hdfs permissions. It could be solved by
> setting hdfs filesystem permissions.
> {code}
> bash-4.1# /usr/local/hadoop/bin/hdfs dfs -chmod 777 /
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)