New reply on DataCleaner's online discussion forum (https://datacleaner.org/forum):
Dennis replied to subject 'Runnin DataCleaner on Spark as a local file' ------------------- Hi Henrique, It seems our base library for HDFS resources is a little out of sync, so we're using the obsolete "s3:" scheme for saving on S3, so there's a chance it wont work, unless you're running a pretty ancient Hadoop (2.2-). However, if you're running on Amazon EMR, I think it will use Amazons own implementation for S3, and then it would work. There is also a chance that you can use the "s3a:" scheme (Hadoop 2.7+) or "s3n:" scheme (Hadoop 2.3-2.6), since we're already forcing it to use a HDFS resource. However, we never tried using DataCleaner with S3 on Hadoop, so I don't really have any instructions for you, except [https://wiki.apache.org/hadoop/AmazonS3 Hadoops own overview]. I'd love to hear how it went, and what issues you might run into. I don't think S3 is going to be a primary objective for us, but if we can take steps to make it easier to use, we'd love to do that. BR, Dennis ------------------- View the topic online to reply - go to https://datacleaner.org/topic/1140/Runnin-DataCleaner-on-Spark-as-a-local-file -- You received this message because you are subscribed to the Google Groups "DataCleaner-notify" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/datacleaner-notify. For more options, visit https://groups.google.com/d/optout.
