[DataCleaner-notify] Re: [datacleaner.org] Runnin DataCleaner on Spark as a local file

Dennis Tue, 30 Aug 2016 07:27:11 -0700

New reply on DataCleaner's online discussion forum 
(https://datacleaner.org/forum):


Dennis replied to subject 'Runnin DataCleaner on Spark as a local file'

-------------------

Hi Henrique,

It seems our base library for HDFS resources is a little out of sync, so we're 
using the obsolete "s3:" scheme for saving on S3, so there's a chance it wont 
work, unless you're running a pretty ancient Hadoop (2.2-).

However, if you're running on Amazon EMR, I think it will use Amazons own 
implementation for S3, and then it would work. There is also a chance that you 
can use the "s3a:" scheme (Hadoop 2.7+) or "s3n:" scheme (Hadoop 2.3-2.6), 
since we're already forcing it to use a HDFS resource.

However, we never tried using DataCleaner with S3 on Hadoop, so I don't really 
have any instructions for you, except [https://wiki.apache.org/hadoop/AmazonS3 
Hadoops own overview].

I'd love to hear how it went, and what issues you might run into. I don't think 
S3 is going to be a primary objective for us, but if we can take steps to make 
it easier to use, we'd love to do that.

BR,
Dennis

-------------------

View the topic online to reply - go to 
https://datacleaner.org/topic/1140/Runnin-DataCleaner-on-Spark-as-a-local-file

-- 
You received this message because you are subscribed to the Google Groups 
"DataCleaner-notify" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/datacleaner-notify.
For more options, visit https://groups.google.com/d/optout.

[DataCleaner-notify] Re: [datacleaner.org] Runnin DataCleaner on Spark as a local file

Reply via email to