[jira] [Commented] (SPARK-11475) DataFrame API saveAsTable() does not work well for HDFS HA

zhangxiongfei (JIRA) Tue, 03 Nov 2015 22:34:42 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988998#comment-14988998
 ]


zhangxiongfei commented on SPARK-11475:
---------------------------------------

Hi [~rekhajoshm]
I think my Hive/Hdfs configuration is correct.Following is my test:
*Spark Shell*
{quote}
*sqlContext.table("cig_dw.map_cid2openid").filter($"date" 
==="2015-11-04").count*

15/11/04 13:39:49 INFO scheduler.DAGScheduler: Job 0 finished: count at 
<console>:20, took 17.185704 s
res0: Long = 1351847
{quote}
*Hive*
{quote}
*select count(*) from cig_dw.map_cid2openid where date='2015-11-04'*
Total MapReduce CPU Time Spent: 33 seconds 430 msec
OK
1351847
{quote}
My hdfs HA configuration is *Active Name:_datanode1.bitauto.dmp_  Standby 
NameNode：_namenode.bitauto.dmp_*. Then I also did a manual failover which 
changes HA configurtion to  "*the Active Name:_namenode.bitauto.dmp_  Standby 
NameNode：_datanode1.bitauto.dmp_*".
Re-run the code,it worked
{quote}
*sqlContext.range(1L,1000L,2L,2).coalesce(1).saveAsTable("dataframeTable")*
15/11/04 13:59:43 INFO datasources.DefaultWriterContainer: Job 
job_201511041359_0000 committed.
15/11/04 13:59:43 INFO parquet.ParquetRelation: Listing 
hdfs://namenode.bitauto.dmp:8020/apps/hive/warehouse/dataframetable on driver
15/11/04 13:59:43 INFO parquet.ParquetRelation: Listing 
hdfs://namenode.bitauto.dmp:8020/apps/hive/warehouse/dataframetable on driver
15/11/04 13:59:43 INFO parquet.ParquetRelation: Listing 
hdfs://namenode.bitauto.dmp:8020/apps/hive/warehouse/dataframetable on driver
15/11/04 13:59:43 INFO hive.HiveContext$$anon$1: Persisting data source 
relation `dataframeTable` with a single input path into Hive metastore in Hive 
compatible format. Input path: 
hdfs://namenode.bitauto.dmp:8020/apps/hive/warehouse/dataframetable.
{quote}
But the path format is hdfs://namenode:8020/xx/xx instead of 
"hdfs://nameservice/xx/xx".
I attached my Hive/HDFS configuration file 


> DataFrame API saveAsTable() does not work well for HDFS HA
> ----------------------------------------------------------
>
>                 Key: SPARK-11475
>                 URL: https://issues.apache.org/jira/browse/SPARK-11475
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>         Environment: Hadoop 2.4 & Spark 1.5.1
>            Reporter: zhangxiongfei
>         Attachments: dataFrame_saveAsTable.txt, hdfs-site.xml, hive-site.xml
>
>
> I was trying to save a DF to Hive using following code:
> {quote}
> sqlContext.range(1L,1000L,2L,2).coalesce(1).saveAsTable("dataframeTable")
> {quote}
> But got below exception:
> {quote}
> arning: there were 1 deprecation warning(s); re-run with -deprecation for 
> details
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1610)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1193)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3516)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:785)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(
> {quote}
> *My Hive configuration is* :
> {quote}
>    <property>
>       <name>hive.metastore.warehouse.dir</name>
>       <value>*/apps/hive/warehouse*</value>
>     </property>
> {quote}
> It seems that the hdfs HA is not configured,then I tried below code:
> {quote}
> sqlContext.range(1L,1000L,2L,2).coalesce(1).saveAsParquetFile("hdfs://bitautodmp/apps/hive/warehouse/dataframeTable")
> {quote}
> I could verified that  API *saveAsParquetFile* worked well by following 
> commands:
> {quote}
> *hadoop fs -ls /apps/hive/warehouse/dataframeTable*
> Found 4 items
> -rw-r--r--   3 zhangxf hdfs          0 2015-11-03 17:57 
> */apps/hive/warehouse/dataframeTable/_SUCCESS*
> -rw-r--r--   3 zhangxf hdfs        199 2015-11-03 17:57 
> */apps/hive/warehouse/dataframeTable/_common_metadata*
> -rw-r--r--   3 zhangxf hdfs        325 2015-11-03 17:57 
> */apps/hive/warehouse/dataframeTable/_metadata*
> -rw-r--r--   3 zhangxf hdfs       1098 2015-11-03 17:57 
> */apps/hive/warehouse/dataframeTable/part-r-00000-a05a9bf3-b2a6-40e5-b180-818efb2a0f54.gz.parquet*
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-11475) DataFrame API saveAsTable() does not work well for HDFS HA

Reply via email to