[jira] [Commented] (SPARK-17236) Use saveAsHadoopDataset to save RDD to HBASE, long time no response

JIRA Thu, 25 Aug 2016 01:44:35 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436510#comment-15436510
 ]


庞俊辉 commented on SPARK-17236:
-----------------------------

Thank you very much.

the deploy env is:
3 servers: master, slave01, slave02,
hadoop cluster、hbase cluster、spark cluster same on the three servers.

the code is below:
val conf = new 
SparkConf().setAppName("FileAna").setMaster("spark://master:7077").
      set("spark.driver.host", "192.168.1.139").
      setJars(List("/home/pang/woozoomws/spark-service.jar",
        "/home/pang/woozoomws/spark-service/lib/hbase/hbase-common-1.2.2.jar",
        "/home/pang/woozoomws/spark-service/lib/hbase/hbase-client-1.2.2.jar",
        "/home/pang/woozoomws/spark-service/lib/hbase/hbase-protocol-1.2.2.jar",
        
"/home/pang/woozoomws/spark-service/lib/hbase/htrace-core-3.1.0-incubating.jar",
        "/home/pang/woozoomws/spark-service/lib/hbase/hbase-server-1.2.2.jar"))
    val sc = new SparkContext(conf)
    val hbaseConf = HBaseConfiguration.create()
    val jobConf = new JobConf(hbaseConf, this.getClass)
    jobConf.setOutputFormat(classOf[TableOutputFormat])
    jobConf.set(TableOutputFormat.OUTPUT_TABLE, "MissionItem")
    def convert(triple: (Int, String, Int)) = {
      val p = new Put(Bytes.toBytes(triple._1))
      p.addColumn(Bytes.toBytes("data"), Bytes.toBytes("name"), 
Bytes.toBytes(triple._2))
      p.addColumn(Bytes.toBytes("data"), Bytes.toBytes("age"), 
Bytes.toBytes(triple._3))
      (new ImmutableBytesWritable, p)
    }
    val rawData = List((1, "lilei", 14), (2, "hanmei", 18), (3, "someone", 38))
    val localData = sc.parallelize(rawData).map(convert)
    localData.saveAsHadoopDataset(jobConf)

and if I just access to hbase without rdd, will be success, like this:
    val hbaseConf = HBaseConfiguration.create()
    val table = new HTable(hbaseConf, TableName.valueOf("MissionItem"))
        for (i <- 0 until 100) {
    ////      val missionItem = y.asInstanceOf[msg_mission_item]
          val put = new Put(Bytes.toBytes(String.valueOf(i)))
          put.addColumn(Bytes.toBytes("data"), Bytes.toBytes("x"),
            Bytes.toBytes(String.valueOf(i)));
          table.put(put);
        }

> Use saveAsHadoopDataset to save RDD to HBASE, long time no response
> -------------------------------------------------------------------
>
>                 Key: SPARK-17236
>                 URL: https://issues.apache.org/jira/browse/SPARK-17236
>             Project: Spark
>          Issue Type: Question
>          Components: Input/Output
>         Environment: spark 2.0.0
> hbase 1.2.2
> hadoop 2.6.4
>            Reporter: 庞俊辉
>
> In scala, use saveAsHadoopDataset  to save RDD to hbase, 
> the application stoped after some logs output:
> BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.247:55383
> then, stop for long long time, 
> I killed the application, I can see many many err log in web site like this:
> 16/08/25 22:50:23 INFO client.RpcRetryingCaller: Call exception, tries=10, 
> retries=35, started=38326 ms ago, cancelled=false, msg=row 
> 'MissionItem,,99999999999999' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=master,16020,1472135364343, seqNum=0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-17236) Use saveAsHadoopDataset to save RDD to HBASE, long time no response

Reply via email to