[
https://issues.apache.org/jira/browse/SPARK-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988998#comment-14988998
]
zhangxiongfei commented on SPARK-11475:
---------------------------------------
Hi [~rekhajoshm]
I think my Hive/Hdfs configuration is correct.Following is my test:
*Spark Shell*
{quote}
*sqlContext.table("cig_dw.map_cid2openid").filter($"date"
==="2015-11-04").count*
15/11/04 13:39:49 INFO scheduler.DAGScheduler: Job 0 finished: count at
<console>:20, took 17.185704 s
res0: Long = 1351847
{quote}
*Hive*
{quote}
*select count(*) from cig_dw.map_cid2openid where date='2015-11-04'*
Total MapReduce CPU Time Spent: 33 seconds 430 msec
OK
1351847
{quote}
My hdfs HA configuration is *Active Name:_datanode1.bitauto.dmp_ Standby
NameNodeļ¼_namenode.bitauto.dmp_*. Then I also did a manual failover which
changes HA configurtion to "*the Active Name:_namenode.bitauto.dmp_ Standby
NameNodeļ¼_datanode1.bitauto.dmp_*".
Re-run the code,it worked
{quote}
*sqlContext.range(1L,1000L,2L,2).coalesce(1).saveAsTable("dataframeTable")*
15/11/04 13:59:43 INFO datasources.DefaultWriterContainer: Job
job_201511041359_0000 committed.
15/11/04 13:59:43 INFO parquet.ParquetRelation: Listing
hdfs://namenode.bitauto.dmp:8020/apps/hive/warehouse/dataframetable on driver
15/11/04 13:59:43 INFO parquet.ParquetRelation: Listing
hdfs://namenode.bitauto.dmp:8020/apps/hive/warehouse/dataframetable on driver
15/11/04 13:59:43 INFO parquet.ParquetRelation: Listing
hdfs://namenode.bitauto.dmp:8020/apps/hive/warehouse/dataframetable on driver
15/11/04 13:59:43 INFO hive.HiveContext$$anon$1: Persisting data source
relation `dataframeTable` with a single input path into Hive metastore in Hive
compatible format. Input path:
hdfs://namenode.bitauto.dmp:8020/apps/hive/warehouse/dataframetable.
{quote}
But the path format is hdfs://namenode:8020/xx/xx instead of
"hdfs://nameservice/xx/xx".
I attached my Hive/HDFS configuration file
> DataFrame API saveAsTable() does not work well for HDFS HA
> ----------------------------------------------------------
>
> Key: SPARK-11475
> URL: https://issues.apache.org/jira/browse/SPARK-11475
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.1
> Environment: Hadoop 2.4 & Spark 1.5.1
> Reporter: zhangxiongfei
> Attachments: dataFrame_saveAsTable.txt, hdfs-site.xml, hive-site.xml
>
>
> I was trying to save a DF to Hive using following code:
> {quote}
> sqlContext.range(1L,1000L,2L,2).coalesce(1).saveAsTable("dataframeTable")
> {quote}
> But got below exception:
> {quote}
> arning: there were 1 deprecation warning(s); re-run with -deprecation for
> details
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
> Operation category READ is not supported in state standby
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1610)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1193)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3516)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:785)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(
> {quote}
> *My Hive configuration is* :
> {quote}
> <property>
> <name>hive.metastore.warehouse.dir</name>
> <value>*/apps/hive/warehouse*</value>
> </property>
> {quote}
> It seems that the hdfs HA is not configured,then I tried below code:
> {quote}
> sqlContext.range(1L,1000L,2L,2).coalesce(1).saveAsParquetFile("hdfs://bitautodmp/apps/hive/warehouse/dataframeTable")
> {quote}
> I could verified that API *saveAsParquetFile* worked well by following
> commands:
> {quote}
> *hadoop fs -ls /apps/hive/warehouse/dataframeTable*
> Found 4 items
> -rw-r--r-- 3 zhangxf hdfs 0 2015-11-03 17:57
> */apps/hive/warehouse/dataframeTable/_SUCCESS*
> -rw-r--r-- 3 zhangxf hdfs 199 2015-11-03 17:57
> */apps/hive/warehouse/dataframeTable/_common_metadata*
> -rw-r--r-- 3 zhangxf hdfs 325 2015-11-03 17:57
> */apps/hive/warehouse/dataframeTable/_metadata*
> -rw-r--r-- 3 zhangxf hdfs 1098 2015-11-03 17:57
> */apps/hive/warehouse/dataframeTable/part-r-00000-a05a9bf3-b2a6-40e5-b180-818efb2a0f54.gz.parquet*
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]