[
https://issues.apache.org/jira/browse/CAMEL-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242546#comment-14242546
]
Josef Ludvíček edited comment on CAMEL-8040 at 12/11/14 2:09 PM:
-----------------------------------------------------------------
Hi Williem,
yeah, but the docs says that "Override, which is the default, replaces the
existing *file*."
But from what I see, it is replacing *chunks of that file* so in the end, I
don't even have valid file, just last data chunk of original file from hadoop.
If it was picture, it would be corrupted.
It looks like camel handles data chunk (with size of bufferSize - default 4096)
as it was the whole file.
was (Author: ludvicekj):
Hi Williem,
yeah, but the docs says that "Override, which is the default, replaces the
existing *file*."
But from what I see, it is replacing *chunks of that file* so in the end, I
don't even have valid file, just last data chunk of original file from hadoop.
If it was picture, it would be corrupted.
> camel-hdfs2 consumer overwriting data instead of appending them
> ---------------------------------------------------------------
>
> Key: CAMEL-8040
> URL: https://issues.apache.org/jira/browse/CAMEL-8040
> Project: Camel
> Issue Type: Bug
> Components: camel-hdfs
> Affects Versions: 2.13.0, 2.14.0
> Reporter: Josef Ludvíček
> Assignee: Willem Jiang
> Attachments: hdfs-reproducer.zip
>
>
> h1. camel-hdfs2 consumer overwriting data instead of appending them
> There is probably bug in camel hdfs2 consumer.
> In this project are two camel routes, one taking files from `test-source` and
> uploading them to hadoop hdfs,
> another route watching folder in hadoop hdfs and downloading them to
> `test-dest` folder in this project.
> It seems, that when downloading file from hdfs to local filesystem, it keeps
> writing chunks of data to begining of target file in test-source, instead of
> simply appending chunks, as I would expect.
> From camel log i suppose, that each chunk of data from hadoop file is treated
> it was whole file.
> Ruby script `generate_textfile.rb` can generate file `test.txt` with content
> {code}
> 0 - line
> 1 - line
> 2 - line
> 3 - line
> 4 - line
> 5 - line
> ...
> ...
> 99999 - line
> {code}
> h2. Scenario
> - _expecting running hadoop instance on localhost:8020_
> - run mvn camel:run
> - copy test.txt into test-source
> - see log and file test.txt in test-dest
> - rest.txt in test-dest folder should contain only last x lines of original
> one.
>
>
> Camel log
> {code}
> [localhost:8020/tmp/camel-test/] toFile INFO picked up
> file from hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile INFO file
> downloaded from hadoop
> [localhost:8020/tmp/camel-test/] toFile INFO picked up
> file from hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile INFO file
> downloaded from hadoop
> [localhost:8020/tmp/camel-test/] toFile INFO picked up
> file from hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile INFO file
> downloaded from hadoop
> [localhost:8020/tmp/camel-test/] toFile INFO picked up
> file from hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile INFO file
> downloaded from hadoop
> {code}
>
> h2. Envoriment
> * camel 2.14 and 2.13
> * hadoop VirtualBox VM
> * * downloaded from
> http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-2-x.html
> * * tested with version 2.3.0-cdh5.1.0,
> r8e266e052e423af592871e2dfe09d54c03f6a0e8 which I couldn't find on download
> page
> * hadoop docker image
> * * https://github.com/sequenceiq/hadoop-docker
> * * results were the same as with virtualbox vm
> In case ov VirtualBox VM, by default it binds hdfs to
> `hdfs://quickstart.cloudera:8020` and it needs to be changed in
> `/etc/hadoop/conf/core-site.xml`. It should work when `fs.defaultFS` is set
> to `hdfs://0.0.0.0:8020`.
> In case of docker hadoop image, first start docker container, figure out its
> ip address, and use it for camel hdfs component.
> Here camel uri would be `hdfs:172.17.0.2:9000/tmp/camel-test`.
> {code}
> docker run -i -t sequenceiq/hadoop-docker:2.5.1 /etc/bootstrap.sh -bash
> Starting sshd: [ OK ]
> Starting namenodes on [966476255fc2]
> 966476255fc2: starting namenode, logging to
> /usr/local/hadoop/logs/hadoop-root-namenode-966476255fc2.out
> localhost: starting datanode, logging to
> /usr/local/hadoop/logs/hadoop-root-datanode-966476255fc2.out
> Starting secondary namenodes [0.0.0.0]
> 0.0.0.0: starting secondarynamenode, logging to
> /usr/local/hadoop/logs/hadoop-root-secondarynamenode-966476255fc2.out
> starting yarn daemons
> starting resourcemanager, logging to
> /usr/local/hadoop/logs/yarn--resourcemanager-966476255fc2.out
> localhost: starting nodemanager, logging to
> /usr/local/hadoop/logs/yarn-root-nodemanager-966476255fc2.out
> {code}
> see to which IP hdfs filesystem api is bound to inside docker container
> {code}
> bash-4.1# netstat -tulnp
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address Foreign Address
> State PID/Program name
> ...
> tcp 0 0 172.17.0.2:9000 0.0.0.0:*
> LISTEN -
> ...
> {code}
> There might be Exception because of hdfs permissions. It could be solved by
> setting hdfs filesystem permissions.
> {code}
> bash-4.1# /usr/local/hadoop/bin/hdfs dfs -chmod 777 /
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)