New reply on DataCleaner's online discussion forum (https://datacleaner.org/forum):
Henrique replied to subject 'Runnin DataCleaner on Spark as a local file' ------------------- Just saw on the code, several references for Yarn. It is suppossed to run just on YARN mode? /Users/henriqueandrade/Documents/App/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-submit \ --class org.datacleaner.spark.Main \ --master local[1] \ DataCleaner-env-spark-5.1.3-SNAPSHOT-jar-with-dependencies.jar \ conf_local.xml \ vanilla-job.analysis.xml \ jobAbsolutePath.properties jobAbsolutePath.properties datacleaner.result.hdfs.path=s3n://exceed-ingestion/results/myresult.analysis.result.dat conf_local.xml <configuration xmlns="http://eobjects.org/analyzerbeans/configuration/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <datastore-catalog> <csv-datastore name="person_names"> <filename>file:///Users/henriqueandrade/Documents/App/spark/DataCleaner/engine/env/spark/target/person_names.txt</filename> <quote-char>"</quote-char> <separator-char>,</separator-char> <escape-char>\</escape-char> <encoding>UTF-8</encoding> <fail-on-inconsistencies>true</fail-on-inconsistencies> <multiline-values>false</multiline-values> <header-line-number>1</header-line-number> </csv-datastore> <json-datastore name="person_data"> <filename>./person_data.json</filename> </json-datastore> </datastore-catalog> </configuration> vanilla-job.analysis.xml <?xml version="1.0" encoding="UTF-8"?> <job xmlns="http://eobjects.org/analyzerbeans/job/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <source> <data-context ref="person_names" /> <columns> <column id="col_id" path="id" /> <column id="col_name" path="name" /> <column id="col_company" path="company" /> <column id="col_country" path="country" /> </columns> </source> <analysis> <analyzer> <descriptor ref="String analyzer" /> <input ref="col_company" /> <properties/> </analyzer> <analyzer> <descriptor ref="Value distribution" /> <input ref="col_country" /> <properties> <property name="Record unique values" value="true"/> <property name="Record drill-down information" value="true"/> <property name="Top n most frequent values" value="<null>"/> <property name="Bottom n most frequent values" value="<null>"/> </properties> </analyzer> </analysis> </job> ------------------- View the topic online to reply - go to https://datacleaner.org/topic/1140/Runnin-DataCleaner-on-Spark-as-a-local-file -- You received this message because you are subscribed to the Google Groups "DataCleaner-notify" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/datacleaner-notify. For more options, visit https://groups.google.com/d/optout.
