I found GORA-386 Gora Spark Backend Support

Should the discussion be continued there ?

Cheers

On Wed, Aug 26, 2015 at 7:02 AM, Ted Malaska <ted.mala...@cloudera.com>
wrote:

> Where is the input format class.  When every I use the search on your
> github it says "We couldn’t find any issues matching 'GoraInputFormat'"
>
>
>
> On Wed, Aug 26, 2015 at 9:48 AM, Furkan KAMACI <furkankam...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Here is the MapReduceTestUtils.testSparkWordCount()
>>
>>
>> https://github.com/kamaci/gora/blob/master/gora-core/src/test/java/org/apache/gora/mapreduce/MapReduceTestUtils.java#L108
>>
>> Here is SparkWordCount
>>
>>
>> https://github.com/kamaci/gora/blob/8f1acc6d4ef6c192e8fc06287558b7bc7c39b040/gora-core/src/examples/java/org/apache/gora/examples/spark/SparkWordCount.java
>>
>> Lastly, here is GoraSparkEngine:
>>
>>
>> https://github.com/kamaci/gora/blob/master/gora-core/src/main/java/org/apache/gora/spark/GoraSparkEngine.java
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>> On Wed, Aug 26, 2015 at 4:40 PM, Ted Malaska <ted.mala...@cloudera.com>
>> wrote:
>>
>>> Where can I find the code for MapReduceTestUtils.testSparkWordCount?
>>>
>>> On Wed, Aug 26, 2015 at 9:29 AM, Furkan KAMACI <furkankam...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Here is the test method I've ignored due to Connection Refused problem
>>>> failure:
>>>>
>>>>
>>>> https://github.com/kamaci/gora/blob/master/gora-hbase/src/test/java/org/apache/gora/hbase/mapreduce/TestHBaseStoreWordCount.java#L65
>>>>
>>>> I've implemented a Spark backend for Apache Gora as GSoC project and
>>>> this is the latest obstacle that I should solve. If you can help me, you
>>>> are welcome.
>>>>
>>>> Kind Regards,
>>>> Furkan KAMACI
>>>>
>>>> On Wed, Aug 26, 2015 at 3:45 PM, Ted Malaska <ted.mala...@cloudera.com>
>>>> wrote:
>>>>
>>>>> I've always used HBaseTestingUtility and never really had much
>>>>> trouble. I use that for all my unit testing between Spark and HBase.
>>>>>
>>>>> Here are some code examples if your interested
>>>>>
>>>>> --Main HBase-Spark Module
>>>>> https://github.com/apache/hbase/tree/master/hbase-spark
>>>>>
>>>>> --Unit test that cover all basic connections
>>>>>
>>>>> https://github.com/apache/hbase/blob/master/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseContextSuite.scala
>>>>>
>>>>> --If you want to look at the old stuff before it went into HBase
>>>>> https://github.com/cloudera-labs/SparkOnHBase
>>>>>
>>>>> Let me know if that helps
>>>>>
>>>>> On Wed, Aug 26, 2015 at 5:40 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>
>>>>>> Can you log the contents of the Configuration you pass from Spark ?
>>>>>> The output would give you some clue.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Aug 26, 2015, at 2:30 AM, Furkan KAMACI <furkankam...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Ted,
>>>>>>
>>>>>> I'll check Zookeeper connection but another test method which runs on
>>>>>> hbase without Spark works without any error. Hbase version is
>>>>>> 0.98.8-hadoop2 and I use Spark 1.3.1
>>>>>>
>>>>>> Kind Regards,
>>>>>> Furkan KAMACI
>>>>>> 26 Ağu 2015 12:08 tarihinde "Ted Yu" <yuzhih...@gmail.com> yazdı:
>>>>>>
>>>>>>> The connection failure was to zookeeper.
>>>>>>>
>>>>>>> Have you verified that localhost:2181 can serve requests ?
>>>>>>> What version of hbase was Gora built against ?
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Aug 26, 2015, at 1:50 AM, Furkan KAMACI <furkankam...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I start an Hbase cluster for my test class. I use that helper class:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/apache/gora/blob/master/gora-hbase/src/test/java/org/apache/gora/hbase/util/HBaseClusterSingleton.java
>>>>>>>
>>>>>>> and use it as like that:
>>>>>>>
>>>>>>> private static final HBaseClusterSingleton cluster =
>>>>>>> HBaseClusterSingleton.build(1);
>>>>>>>
>>>>>>> I retrieve configuration object as follows:
>>>>>>>
>>>>>>> cluster.getConf()
>>>>>>>
>>>>>>> and I use it at Spark as follows:
>>>>>>>
>>>>>>> sparkContext.newAPIHadoopRDD(conf, MyInputFormat.class, clazzK,
>>>>>>>     clazzV);
>>>>>>>
>>>>>>> When I run my test there is no need to startup an Hbase cluster
>>>>>>> because Spark will connect to my dummy cluster. However when I run my 
>>>>>>> test
>>>>>>> method it throws an error:
>>>>>>>
>>>>>>> 2015-08-26 01:19:59,558 INFO [Executor task launch
>>>>>>> worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn
>>>>>>> (ClientCnxn.java:logStartConnect(966)) - Opening socket connection to
>>>>>>> server localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>> using SASL (unknown error)
>>>>>>>
>>>>>>> 2015-08-26 01:19:59,559 WARN [Executor task launch
>>>>>>> worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn
>>>>>>> (ClientCnxn.java:run(1089)) - Session 0x0 for server null, unexpected
>>>>>>> error, closing socket connection and attempting reconnect
>>>>>>> java.net.ConnectException: Connection refused at
>>>>>>> sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at
>>>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) 
>>>>>>> at
>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
>>>>>>> Hbase tests, which do not run on Spark, works well. When I check the
>>>>>>> logs I see that cluster and Spark is started up correctly:
>>>>>>>
>>>>>>> 2015-08-26 01:35:21,791 INFO [main] hdfs.MiniDFSCluster
>>>>>>> (MiniDFSCluster.java:waitActive(2055)) - Cluster is active
>>>>>>>
>>>>>>> 2015-08-26 01:35:40,334 INFO [main] util.Utils
>>>>>>> (Logging.scala:logInfo(59)) - Successfully started service 
>>>>>>> 'sparkDriver' on
>>>>>>> port 56941.
>>>>>>> I realized that when I start up an hbase from command line my test
>>>>>>> method for Spark connects to it!
>>>>>>>
>>>>>>> So, does it means that it doesn't care about the conf I passed to
>>>>>>> it? Any ideas about how to solve it?
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to