[
https://issues.apache.org/jira/browse/SPARK-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-7270:
-----------------------------
Assignee: Liang-Chi Hsieh
> StringType dynamic partition cast to DecimalType in Spark Sql Hive
> -------------------------------------------------------------------
>
> Key: SPARK-7270
> URL: https://issues.apache.org/jira/browse/SPARK-7270
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Feixiang Yan
> Assignee: Liang-Chi Hsieh
> Fix For: 1.4.0
>
>
> Create a hive table with two partitons,the first type is bigint and the
> second type is string.When insert overwrite the table with one static
> partiton and one dynamic partiton, the second StringType dynamic partition
> will be cast to DecimalType.
> {noformat}
> desc test;
> OK
> a string None
> b bigint None
> c string None
>
> # Partition Information
> # col_name data_type comment
>
> b bigint None
> c string NoneĀ·
> {noformat}
> when run following hive sql in HiveContext
> {noformat}sqlContext.sql("insert overwrite table test partition (b=1,c)
> select 'a','c' from ptest"){noformat}
> get the result of partition is
> {noformat}test[1,__HIVE_DEFAULT_PARTITION__]{noformat}
> spark log
> {noformat}15/04/30 10:38:09 WARN HiveConf: DEPRECATED:
> hive.metastore.ds.retry.* no longer has any effect. Use
> hive.hmshandler.retry.* instead
> 15/04/30 10:38:09 INFO ParseDriver: Parsing command: insert overwrite table
> test partition (b=1,c) select 'a','c' from ptest
> 15/04/30 10:38:09 INFO ParseDriver: Parse Completed
> 15/04/30 10:38:09 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no
> longer has any effect. Use hive.hmshandler.retry.* instead
> 15/04/30 10:38:10 INFO HiveMetaStore: 0: Opening raw store with implemenation
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 15/04/30 10:38:10 INFO ObjectStore: ObjectStore, initialize called
> 15/04/30 10:38:10 INFO Persistence: Property datanucleus.cache.level2 unknown
> - will be ignored
> 15/04/30 10:38:10 INFO Persistence: Property
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 15/04/30 10:38:10 WARN Connection: BoneCP specified but not present in
> CLASSPATH (or one of dependencies)
> 15/04/30 10:38:10 WARN Connection: BoneCP specified but not present in
> CLASSPATH (or one of dependencies)
> 15/04/30 10:38:11 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no
> longer has any effect. Use hive.hmshandler.retry.* instead
> 15/04/30 10:38:11 INFO ObjectStore: Setting MetaStore object pin classes with
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 15/04/30 10:38:11 INFO Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
> "embedded-only" so does not have its own datastore table.
> 15/04/30 10:38:11 INFO Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"
> so does not have its own datastore table.
> 15/04/30 10:38:12 INFO Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
> "embedded-only" so does not have its own datastore table.
> 15/04/30 10:38:12 INFO Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"
> so does not have its own datastore table.
> 15/04/30 10:38:12 INFO Query: Reading in results for query
> "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is
> closing
> 15/04/30 10:38:12 INFO ObjectStore: Initialized ObjectStore
> 15/04/30 10:38:12 INFO HiveMetaStore: Added admin role in metastore
> 15/04/30 10:38:12 INFO HiveMetaStore: Added public role in metastore
> 15/04/30 10:38:12 INFO HiveMetaStore: No user is added in admin role, since
> config is empty
> 15/04/30 10:38:12 INFO SessionState: No Tez session required at this point.
> hive.execution.engine=mr.
> 15/04/30 10:38:13 INFO HiveMetaStore: 0: get_table : db=default tbl=test
> 15/04/30 10:38:13 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=get_table : db=default tbl=test
> 15/04/30 10:38:13 INFO HiveMetaStore: 0: get_partitions : db=default tbl=test
> 15/04/30 10:38:13 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=get_partitions : db=default tbl=test
> 15/04/30 10:38:13 INFO HiveMetaStore: 0: get_table : db=default tbl=ptest
> 15/04/30 10:38:13 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=get_table : db=default tbl=ptest
> 15/04/30 10:38:13 INFO HiveMetaStore: 0: get_partitions : db=default tbl=ptest
> 15/04/30 10:38:13 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=get_partitions : db=default tbl=ptest
> 15/04/30 10:38:13 INFO deprecation: mapred.map.tasks is deprecated. Instead,
> use mapreduce.job.maps
> 15/04/30 10:38:13 INFO MemoryStore: ensureFreeSpace(451930) called with
> curMem=0, maxMem=2291041566
> 15/04/30 10:38:13 INFO MemoryStore: Block broadcast_0 stored as values in
> memory (estimated size 441.3 KB, free 2.1 GB)
> 15/04/30 10:38:13 INFO MemoryStore: ensureFreeSpace(71321) called with
> curMem=451930, maxMem=2291041566
> 15/04/30 10:38:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes
> in memory (estimated size 69.6 KB, free 2.1 GB)
> 15/04/30 10:38:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
> on 10.134.72.169:45859 (size: 69.6 KB, free: 2.1 GB)
> 15/04/30 10:38:13 INFO BlockManagerMaster: Updated info of block
> broadcast_0_piece0
> 15/04/30 10:38:13 INFO SparkContext: Created broadcast 0 from broadcast at
> TableReader.scala:68
> 15/04/30 10:38:13 INFO deprecation: mapred.output.compress is deprecated.
> Instead, use mapreduce.output.fileoutputformat.compress
> 15/04/30 10:38:13 INFO deprecation: mapred.output.compression.codec is
> deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
> 15/04/30 10:38:13 INFO deprecation: mapred.output.compression.type is
> deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
> 15/04/30 10:38:14 INFO deprecation: mapred.job.id is deprecated. Instead, use
> mapreduce.job.id
> 15/04/30 10:38:14 INFO deprecation: mapred.tip.id is deprecated. Instead, use
> mapreduce.task.id
> 15/04/30 10:38:14 INFO deprecation: mapred.task.id is deprecated. Instead,
> use mapreduce.task.attempt.id
> 15/04/30 10:38:14 INFO deprecation: mapred.task.is.map is deprecated.
> Instead, use mapreduce.task.ismap
> 15/04/30 10:38:14 INFO deprecation: mapred.task.partition is deprecated.
> Instead, use mapreduce.task.partition
> 15/04/30 10:38:14 INFO GPLNativeCodeLoader: Loaded native gpl library
> 15/04/30 10:38:14 INFO LzoCodec: Successfully loaded & initialized native-lzo
> library [hadoop-lzo rev 7041408c0d57cb3b6f51d004772ccf5073ecc95e]
> 15/04/30 10:38:14 INFO FileInputFormat: Total input paths to process : 1
> 15/04/30 10:38:14 INFO SparkContext: Starting job: runJob at
> InsertIntoHiveTable.scala:93
> 15/04/30 10:38:14 INFO DAGScheduler: Got job 0 (runJob at
> InsertIntoHiveTable.scala:93) with 1 output partitions (allowLocal=false)
> 15/04/30 10:38:14 INFO DAGScheduler: Final stage: Stage 0(runJob at
> InsertIntoHiveTable.scala:93)
> 15/04/30 10:38:14 INFO DAGScheduler: Parents of final stage: List()
> 15/04/30 10:38:14 INFO DAGScheduler: Missing parents: List()
> 15/04/30 10:38:14 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[5]
> at mapPartitions at basicOperators.scala:43), which has no missing parents
> 15/04/30 10:38:14 INFO MemoryStore: ensureFreeSpace(125560) called with
> curMem=523251, maxMem=2291041566
> 15/04/30 10:38:14 INFO MemoryStore: Block broadcast_1 stored as values in
> memory (estimated size 122.6 KB, free 2.1 GB)
> 15/04/30 10:38:14 INFO MemoryStore: ensureFreeSpace(82648) called with
> curMem=648811, maxMem=2291041566
> 15/04/30 10:38:14 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes
> in memory (estimated size 80.7 KB, free 2.1 GB)
> 15/04/30 10:38:14 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
> on 10.134.72.169:45859 (size: 80.7 KB, free: 2.1 GB)
> 15/04/30 10:38:14 INFO BlockManagerMaster: Updated info of block
> broadcast_1_piece0
> 15/04/30 10:38:14 INFO SparkContext: Created broadcast 1 from broadcast at
> DAGScheduler.scala:838
> 15/04/30 10:38:14 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0
> (MapPartitionsRDD[5] at mapPartitions at basicOperators.scala:43)
> 15/04/30 10:38:14 INFO YarnClientClusterScheduler: Adding task set 0.0 with 1
> tasks
> 15/04/30 10:38:14 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0,
> rsync.slave006.yarn.hadoop.sjs.sogou-op.org, NODE_LOCAL, 1794 bytes)
> 15/04/30 10:38:14 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
> on rsync.slave006.yarn.hadoop.sjs.sogou-op.org:55678 (size: 80.7 KB, free:
> 5.3 GB)
> 15/04/30 10:38:16 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
> on rsync.slave006.yarn.hadoop.sjs.sogou-op.org:55678 (size: 69.6 KB, free:
> 5.3 GB)
> 15/04/30 10:38:17 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0)
> in 3152 ms on rsync.slave006.yarn.hadoop.sjs.sogou-op.org (1/1)
> 15/04/30 10:38:17 INFO DAGScheduler: Stage 0 (runJob at
> InsertIntoHiveTable.scala:93) finished in 3.162 s
> 15/04/30 10:38:17 INFO YarnClientClusterScheduler: Removed TaskSet 0.0, whose
> tasks have all completed, from pool
> 15/04/30 10:38:17 INFO DAGScheduler: Job 0 finished: runJob at
> InsertIntoHiveTable.scala:93, took 3.369777 s
> 15/04/30 10:38:17 INFO HiveMetaStore: 0: partition_name_has_valid_characters
> 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=partition_name_has_valid_characters
> 15/04/30 10:38:17 INFO HiveMetaStore: 0: partition_name_has_valid_characters
> 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=partition_name_has_valid_characters
> 15/04/30 10:38:17 WARN UserGroupInformation: No groups available for user root
> 15/04/30 10:38:17 WARN UserGroupInformation: No groups available for user root
> 15/04/30 10:38:17 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no
> longer has any effect. Use hive.hmshandler.retry.* instead
> 15/04/30 10:38:17 INFO HiveMetaStore: 0: get_table : db=default tbl=test
> 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=get_table : db=default tbl=test
> 15/04/30 10:38:17 INFO HiveMetaStore: 0: get_partition_with_auth : db=default
> tbl=test[1,__HIVE_DEFAULT_PARTITION__]
> 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=get_partition_with_auth : db=default
> tbl=test[1,__HIVE_DEFAULT_PARTITION__]
> 15/04/30 10:38:17 INFO Hive: Replacing
> src:hdfs://yarncluster/tmp/hive-root/hive_2015-04-30_10-38-13_846_3096248751564356035-1/-ext-10000/c=__HIVE_DEFAULT_PARTITION__;dest:
>
> hdfs://yarncluster/user/root/hive/warehouse/test/b=1/c=__HIVE_DEFAULT_PARTITION__;Status:true
> 15/04/30 10:38:17 INFO HiveMetaStore: 0: get_partition_with_auth : db=default
> tbl=test[1,__HIVE_DEFAULT_PARTITION__]
> 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=get_partition_with_auth : db=default
> tbl=test[1,__HIVE_DEFAULT_PARTITION__]
> 15/04/30 10:38:17 INFO HiveMetaStore: 0: append_partition : db=default
> tbl=test[1,__HIVE_DEFAULT_PARTITION__]
> 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr
> cmd=append_partition : db=default tbl=test[1,__HIVE_DEFAULT_PARTITION__]
>
> 15/04/30 10:38:17 WARN log: Updating partition stats fast for: test
> 15/04/30 10:38:17 WARN log: Updated size to 10
> 15/04/30 10:38:17 INFO Hive: New loading path =
> hdfs://yarncluster/tmp/hive-root/hive_2015-04-30_10-38-13_846_3096248751564356035-1/-ext-10000/c=__HIVE_DEFAULT_PARTITION__
> with partSpec {b=1, c=__HIVE_DEFAULT_PARTITION__}
> res0: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[0] at RDD at SchemaRDD.scala:108
> == Query Plan ==
> == Physical Plan ==
> InsertIntoHiveTable (MetastoreRelation default, test, None), Map(b ->
> Some(1), c -> None), true
> Project [a AS _c0#0,CAST(CAST(c AS _c1#1, DecimalType()), LongType) AS
> _c1#8L]
> HiveTableScan [], (MetastoreRelation default, ptest, None), None
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]