[ https://issues.apache.org/jira/browse/CRUNCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158193#comment-15158193 ]
Micah Whitacre commented on CRUNCH-586: --------------------------------------- +1 for the source side. [~joshwills] what were you thinking for the write side? > SparkPipeline does not work with HBaseSourceTarget > -------------------------------------------------- > > Key: CRUNCH-586 > URL: https://issues.apache.org/jira/browse/CRUNCH-586 > Project: Crunch > Issue Type: Bug > Components: Spark > Affects Versions: 0.13.0 > Reporter: Stefan De Smit > Assignee: Josh Wills > Attachments: CRUNCH-586.patch > > > final Pipeline pipeline = new SparkPipeline("local", "crunchhbase", > HBaseInputSource.class, conf); > final PTable<ImmutableBytesWritable, Result> read = pipeline.read(new > HBaseSourceTarget("t1", new Scan())); > return an empty table, while it works with MRPipeline. > root cause is the combination of sparks getJavaRDDLike method: > source.configureSource(job, -1); > Converter converter = source.getConverter(); > JavaPairRDD<?, ?> input = runtime.getSparkContext().newAPIHadoopRDD( > job.getConfiguration(), > CrunchInputFormat.class, > converter.getKeyClass(), > converter.getValueClass()); > That assumes "CrunchInputFormat.class" (and always uses -1) > and hbase configureSoruce method: > if (inputId == -1) { > job.setMapperClass(CrunchMapper.class); > job.setInputFormatClass(inputBundle.getFormatClass()); > inputBundle.configure(conf); > } else { > Path dummy = new Path("/hbase/" + table); > CrunchInputs.addInputPath(job, dummy, inputBundle, inputId); > } > easiest solution I see, is always calling CrunchInputs.addInputPath, in every > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)