Stefan De Smit created CRUNCH-586: ------------------------------------- Summary: SparkPipeline does not work with HBaseSourceTarget Key: CRUNCH-586 URL: https://issues.apache.org/jira/browse/CRUNCH-586 Project: Crunch Issue Type: Bug Components: Spark Affects Versions: 0.13.0 Reporter: Stefan De Smit
final Pipeline pipeline = new SparkPipeline("local", "crunchhbase", HBaseInputSource.class, conf); final PTable<ImmutableBytesWritable, Result> read = pipeline.read(new HBaseSourceTarget("t1", new Scan())); return an empty table, while it works with MRPipeline. root cause is the combination of sparks getJavaRDDLike method: source.configureSource(job, -1); Converter converter = source.getConverter(); JavaPairRDD<?, ?> input = runtime.getSparkContext().newAPIHadoopRDD( job.getConfiguration(), CrunchInputFormat.class, converter.getKeyClass(), converter.getValueClass()); That assumes "CrunchInputFormat.class" (and always uses -1) and hbase configureSoruce method: if (inputId == -1) { job.setMapperClass(CrunchMapper.class); job.setInputFormatClass(inputBundle.getFormatClass()); inputBundle.configure(conf); } else { Path dummy = new Path("/hbase/" + table); CrunchInputs.addInputPath(job, dummy, inputBundle, inputId); } easiest solution I see, is always calling CrunchInputs.addInputPath, in every source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)