GitHub user ClaudiaPHI opened a pull request:
https://github.com/apache/metamodel/pull/36
Changed HdfsResource implemention so that Writing to Hadoop is possible.
This is a part of the story from DataCleaner
https://github.com/datacleaner/DataCleaner/issues/494.
In Hadoop the default replication factor is 3. After a file is created and
closed, it become immutable, because of the replicas. Basically, it means that
append is not allowed.
With this change I made it possible to write to Hadoop a CSV file by
setting in the configuration the replication factor 1.
However, writing is very slow and it can fail anytime if the Hadoop node is
leased or other type of failures occur.
The "CsvUpdateCallback" class appends row by row.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ClaudiaPHI/metamodel
feature/Hdfs-resourse-impl-for-DC-494
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metamodel/pull/36.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #36
----
commit cb82f181b2bf031f6c6c87c58c4800d06913d092
Author: ClaudiaPHI <[email protected]>
Date: 2015-07-31T09:33:26Z
Changed the implemention so that Writing to Hadoop is possible.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---