In spark 2.4 Upgrade Apache Arrow to version 0.12.0

2019-05-14 Thread
Pyarrow version 0.12.1, arrow jar version 0.10, can run correctly. Pyarrow version 0.121, arrow jar version 0.12, this exception occurs: > *Expected schema message in stream, was null or length 0* Pyarrow was upgraded to version 0.12.1 by other package dependency, resulting in inconsistency

Re: [spark-sql] Hive failing on insert empty array into parquet table

2018-12-29 Thread
https://issues.apache.org/jira/browse/HIVE-13632 李斌松 于2018年12月29日周六 下午4:08写道: > Hive has fixed this problem, which is not fixed in > hive-exec-1.2.1.spark2.jar > > [image: image.png] > >

[Spark SQL]use zstd, No enum constant parquet.hadoop.metadata.CompressionCodecName.ZSTD

2018-12-19 Thread
Import parquet-hadoop-bundle jar. into the spark hive project When you compress data using zstd, you may load it preferentially from the parquet-hadoop-bundle, and you canundefinedt find the enum constant parquet.hadoop.metadata.CompressionCodecName.ZSTD > > 18/12/20 10:35:28 ERROR Executor:

If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

2018-06-06 Thread
If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

spark 2.3 dataframe join bug

2018-03-26 Thread
Hi, sparks: I'm using spark2.3 and had found a bug in spark dataframe, here is my codes: sc = sparkSession.sparkContext tmp = sparkSession.createDataFrame(sc.parallelize([[1, 2, 3, 4], [1, 2, 5, 6], [2, 3, 4, 5], [2, 3, 5, 6]])).toDF('a', 'b', 'c', 'd')

How is data desensitization (example: select bank_no from users)?

2017-08-19 Thread
For example, the user's bank card number cannot be viewed by an analyst and replaced by an asterisk. How do you do that in spark?

Limit the number of tasks submitted:spark.submit.tasks.threshold.enabled & spark.submit.tasks.threshold

2017-07-11 Thread
Limit the number of tasks submitted to avoid a task occupancy attitude resources, while you can guide users to set reasonable conditions, [image: 内嵌图片 1] spark_submit_tasks_threshold.patch Description: Binary data - To

Does spark support hive table(parquet) column renaming?

2017-06-19 Thread
Does spark support hive table(parquet) column renaming?

Custom function cannot be accessed across database

2017-05-23 Thread
Custom function cannot be accessed across database, example: The registration function json_extract_value is in database A, and A.json_extract_value cannot be called in the database B SessionCatalog.java externalCatalog.getFunction(currentDb, name.funcName) to

How does spark hiveserver dynamically update function dependent jar?

2017-05-18 Thread
Create a temporary function, reference HDFS on the jar file, update the jar file, not immediately effective, need to restart hiveserver

Spark thriff server hiveStatement.getQueryLog return empty

2017-03-12 Thread
Spark thriff server hiveStatement.getQueryLog return empty?

Spark SQL table authority control?

2017-02-25 Thread
Through the JDBC connection spark thriftserver, execte hive SQL, check whether the table read or write permission to expand hook in hive on spark, you can control permissions, spark on hive what is the point of expansion?

how the sparksession initialization, set currentDatabase value?

2017-01-10 Thread
Spark read hive table, catalog. CurrentDatabase value is the default, how the sparksession initialization, set currentDatabase value? hive.metastore.uris thrift://localhost:9083 IP address (or fully-qualified domain name) and port of the metastore host

What is the difference between hive on spark and spark on hive?

2017-01-09 Thread
What is the difference between hive on spark and spark on hive?

The spark hive udf can read broadcast the variables?

2016-12-17 Thread
The spark hive udf can read broadcast the variables?

How to reflect dynamic registration udf?

2016-12-15 Thread
How to reflect dynamic registration udf? java.lang.UnsupportedOperationException: Schema for type _$13 is not supported at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:153) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)