how to dynamic partition dataframe

2017-01-17 Thread lk_spark
lk_spark

help,I want to call spark-submit from java shell

2017-01-20 Thread lk_spark
in 120 seconds ... 8 more 17/01/20 06:39:05 ERROR CoarseGrainedExecutorBackend: Driver 192.168.0.136:51197 disassociated! Shutting down. 2017-01-20 lk_spark

how to use newAPIHadoopFile

2017-01-16 Thread lk_spark
-01-17 lk_spark

java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext

2017-02-26 Thread lk_spark
value().matches("\\d{4}.*")).map(record => { val assembly = record.topic() val value = record.value val datatime = value.substring(0, 22) val level = value.substring(24, 27) (assembly,value,datatime,level) }) how can I pass parameter to the map function. 2017-02-27 lk_spark

how to merge dataframe write output files

2016-11-09 Thread lk_spark
-10 15:11 /parquetdata/weixin/biztags/biztag2/part-r-00176-0f61afe4-23e8-40bb-b30b-09652ca677bc more an more... 2016-11-10 lk_spark

Re:RE: how to merge dataframe write output files

2016-11-10 Thread lk_spark
ly be too much for JRE or any other runtime to load in memory on a single box. From: lk_spark [mailto:lk_sp...@163.com] Sent: Wednesday, November 9, 2016 11:29 PM To: user.spark <user@spark.apache.org> Subject: how to merge dataframe write output files hi,all: when I call api df.write.p

Re: Re: how to extract arraytype data to file

2016-10-18 Thread lk_spark
Thank you, all of you. explode() is helpful: df.selectExpr("explode(bizs) as e").select("e.*").show() 2016-10-19 lk_spark 发件人:Hyukjin Kwon <gurwls...@gmail.com> 发送时间:2016-10-19 13:16 主题:Re: how to extract arraytype data to file 收件人:"Divya Gehlot"&l

how to extract arraytype data to file

2016-10-18 Thread lk_spark
code| +++ |[4938200, 4938201...|[罗甸网警, 室内设计师杨焰红, ...| |[4938300, 4938301...|[SDCS十全九美, 旅梦长大, ...| |[4938400, 4938401...|[日重重工液压行走回转, 氧老家,...| |[4938500, 4938501...|[PABXSLZ, 陈少燕, 笑蜜...| |[4938600, 4938601...|[税海微云, 西域美农云家店, 福...| +++ what I want is I can read colum in normal row type. how I can do it ? 2016-10-19 lk_spark

Spark ExternalTable doesn't recognize subdir

2016-10-19 Thread lk_spark
sh the metadata. spark doesn't recognize the data in subdir. How I can do it ? 2016-10-20 lk_spark

Re: Re: How to iterate the element of an array in DataFrame?

2016-10-21 Thread lk_spark
: string (nullable = true) 2016-10-21 lk_spark 发件人:颜发才(Yan Facai) <yaf...@gmail.com> 发送时间:2016-10-21 15:35 主题:Re: How to iterate the element of an array in DataFrame? 收件人:"user.spark"<user@spark.apache.org> 抄送: I don't know how to construct `array<struct<category:st

how to change datatype by useing StructType

2017-01-11 Thread lk_spark
level row object), 0, name), StringType), true) if I change my code it will work: val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0),attributes(1).toInt) but this is not a good idea . 2017-01-12 lk_spark

Re: Re: how to change datatype by useing StructType

2017-01-11 Thread lk_spark
yes, field year is in my data: data: kevin,30,2016 shen,30,2016 kai,33,2016 wei,30,2016 this will not work val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0),attributes(1),attributes(2))) but I need read data by configurable. 2017-01-12 lk_

Re: Re: Re: how to change datatype by useing StructType

2017-01-12 Thread lk_spark
Thank you Nicholas , if the sourcedata was csv format ,CSV reader works well. 2017-01-13 lk_spark 发件人:Nicholas Hakobian <nicholas.hakob...@rallyhealth.com> 发送时间:2017-01-13 08:35 主题:Re: Re: Re: how to change datatype by useing StructType 收件人:"lk_spark"<lk_sp...@163

Re: Re: Re: how to change datatype by useing StructType

2017-01-12 Thread lk_spark
cificUnsafeProjection.apply_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290) all the file was Any, what should I do? 2017-01-12 lk_spark 发件人:"lk_spark

Re: Re: Re: how to change datatype by useing StructType

2017-01-12 Thread lk_spark
} else { ab += attributes(i) } } new GenericRow(ab.toArray) } } 2017-01-13 lk_spark 发件人:"lk_spark"<lk_sp...@163.com> 发送时间:2017-01-13 09:49 主题:Re: Re: Re: how to change datatype by useing StructType 收件人:"Nichola

Re: Re: how to add colum to dataframe

2016-12-06 Thread lk_spark
thanks for reply. I will search how to use na.fill . and I don't know how to get the value of the column and do some operation like substr or split. 2016-12-06 lk_spark 发件人:Pankaj Wahane <pankajwah...@live.com> 发送时间:2016-12-06 17:39 主题:Re: how to add colum to dataframe 收件人:"lk_s

how to add colum to dataframe

2016-12-06 Thread lk_spark
| | null|http://mp.weixin| | null|http://mp.weixin| | null|http://mp.weixin| | null|http://mp.weixin| | null|http://mp.weixin| Why what I got is null? 2016-12-06 lk_spark

Re: Re: Re: how to add colum to dataframe

2016-12-06 Thread lk_spark
QwOA==|http://mp.weixin| |MzAwOTIxMTcyMQ==|http://mp.weixin| |MzA3OTAyNzY2OQ==|http://mp.weixin| |MjM5NDAzMDAwMA==|http://mp.weixin| |MzAwMjE4MzU0Nw==|http://mp.weixin| |MzA4NzcyNjI0Mw==|http://mp.weixin| |MzI5OTE5Nzc5Ng==|http://mp.weixin| 2016-12-06 lk_spark

Re: Re: Re: how to call recommend method from ml.recommendation.ALS

2017-03-15 Thread lk_spark
Tank you , that's what I want to confirm. 2017-03-16 lk_spark 发件人:Yuhao Yang <hhb...@gmail.com> 发送时间:2017-03-16 13:05 主题:Re: Re: how to call recommend method from ml.recommendation.ALS 收件人:"lk_spark"<lk_sp...@163.com> 抄送:"任弘迪"<ryan.hd@gmail.com&

Re: Re: how to call recommend method from ml.recommendation.ALS

2017-03-15 Thread lk_spark
thanks for your reply , what I exactly want to know is : in package mllib.recommendation , MatrixFactorizationModel have method like recommendProducts , but I didn't find it in package ml.recommendation. how can I do the samething as mllib when I use ml. 2017-03-16 lk_spark 发件人:任弘迪

how to call recommend method from ml.recommendation.ALS

2017-03-15 Thread lk_spark
hi,all: under spark2.0 ,I wonder to know after trained a ml.recommendation.ALSModel how I can do the recommend action? I try to save the model and load it by MatrixFactorizationModel but got error. 2017-03-16 lk_spark

spark on yarn cluster model can't use saveAsTable ?

2017-05-15 Thread lk_spark
give me some clue? 2017-05-15 lk_spark

spark2.1 and kafka0.10

2017-06-20 Thread lk_spark
hi,all : https://issues.apache.org/jira/browse/SPARK-19680 is this issue have any method to patch it ? I met the same problem. 2017-06-20 lk_spark

Re: spark2.1 kafka0.10

2017-06-21 Thread lk_spark
) at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 2017-06-22 lk_spark 发件人:"lk_spark"<lk_sp...@163.com> 发送时间:2017-06-22 11:13 主题:spark2.1 kafka0.10 收件人:"user.spark"<user@spark.apache.org> 抄送: hi,all

spark2.1 kafka0.10

2017-06-21 Thread lk_spark
ERROR JobScheduler: Error generating jobs for time 1498098896000 ms java.lang.IllegalStateException: No current assignment for partition pages-2 I don't know why ? 2017-06-22 lk_spark

Re: Re: spark2.1 kafka0.10

2017-06-21 Thread lk_spark
each topic have 5 partition , 2 replicas . 2017-06-22 lk_spark 发件人:Pralabh Kumar <pralabhku...@gmail.com> 发送时间:2017-06-22 17:23 主题:Re: spark2.1 kafka0.10 收件人:"lk_spark"<lk_sp...@163.com> 抄送:"user.spark"<user@spark.apache.org> How many replicas ,you h

Re: Re: Re: spark2.1 kafka0.10

2017-06-22 Thread lk_spark
thank you Kumar , I will try it later. 2017-06-22 lk_spark 发件人:Pralabh Kumar <pralabhku...@gmail.com> 发送时间:2017-06-22 20:20 主题:Re: Re: spark2.1 kafka0.10 收件人:"lk_spark"<lk_sp...@163.com> 抄送:"user.spark"<user@spark.apache.org> It looks like your rep

Re: Re: Re: how to call udf with parameters

2017-06-15 Thread lk_spark
thanks Kumar , that really helpful !! 2017-06-16 lk_spark 发件人:Pralabh Kumar <pralabhku...@gmail.com> 发送时间:2017-06-16 18:30 主题:Re: Re: how to call udf with parameters 收件人:"lk_spark"<lk_sp...@163.com> 抄送:"user.spark"<user@spark.apache.org> val

how to call udf with parameters

2017-06-15 Thread lk_spark
sException: cannot resolve '`true`' given input columns: [id, text];; 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] +- Project [_1#2 AS id#5, _2#3 AS text#6] +- LocalRelation [_1#2, _2#3] I need help!! 2017-06-16 lk_spark

Re: Re: how to call udf with parameters

2017-06-15 Thread lk_spark
thanks Kumar , I want to know how to cao udf with multiple parameters , maybe an udf to make a substr function,how can I pass parameter with begin and end index ? I try it with errors. Does the udf parameters could only be a column type? 2017-06-16 lk_spark 发件人:Pralabh Kumar <pralab

Re: spark2.3 on kubernets

2018-04-07 Thread lk_spark
resolved. need to add "kubernetes.default.svc" to k8s api server TLS config. 2018-04-08 lk_spark 发件人:"lk_spark"<lk_sp...@163.com> 发送时间:2018-04-08 11:15 主题:spark2.3 on kubernets 收件人:"user"<user@spark.apache.org> 抄送: hi,all: I am trying s

spark2.3 on kubernets

2018-04-07 Thread lk_spark
2.11-2.3.0.jar 2018-04-08 lk_spark

Re: about LIVY-424

2018-11-11 Thread lk_spark
ve 5760749 rows data. after run about 10 times , the Driver physical memory will beyond 4.5GB and killed by yarn. I saw the old generation memory keep growing and can not release by gc. 2018-11-12 lk_spark 发件人:"lk_hadoop" 发送时间:2018-11-12 09:37 主题:about LIVY-424 收件人:"user"

Re: Re: how to generate a larg dataset paralleled

2018-12-13 Thread lk_spark
generate some data in Spark . 2018-12-14 lk_spark 发件人:Jean Georges Perrin 发送时间:2018-12-14 11:10 主题:Re: how to generate a larg dataset paralleled 收件人:"lk_spark" 抄送:"user.spark" You just want to generate some data in Spark or ingest a large dataset outside of Spark?

how to generate a larg dataset paralleled

2018-12-13 Thread lk_spark
-12-14 lk_spark

Re: Re: how to generate a larg dataset paralleled

2018-12-14 Thread lk_spark
sorry, now what I can do is like this : var df5 = spark.read.parquet("/user/devuser/testdata/df1").coalesce(1) df5 = df5.union(df5).union(df5).union(df5).union(df5) 2018-12-14 lk_spark 发件人:15313776907 <15313776...@163.com> 发送时间:2018-12-14 16:39 主题:Re: how to generat

how to get spark-sql lineage

2019-05-15 Thread lk_spark
lk_spark

how can I dynamic parse json in kafka when using Structured Streaming

2019-09-16 Thread lk_spark
ce$6: org.apache.spark.sql.Encoder[org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema])org.apache.spark.sql.Dataset[org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema]. Unspecified value parameter evidence$6. val words = lines.map(line => { 2019-09-17 lk_spark

Re: Re: how can I dynamic parse json in kafka when using Structured Streaming

2019-09-17 Thread lk_spark
I want to parse the Struct of data dynamically , then write data to delta lake , I think it can automatically merge scheme. 2019-09-17 lk_spark 发件人:Tathagata Das 发送时间:2019-09-17 16:13 主题:Re: how can I dynamic parse json in kafka when using Structured Streaming 收件人:"lk_spar

how spark structrued stream write to kudu

2019-11-25 Thread lk_spark
eNew2KUDU$$anon$1.process(CstoreNew2KUDU.scala:122) ... and SQLImplicits.scala:228 is : 227: implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = { 228:DatasetHolder(_sqlContext.createDataset(s)) 229: } can anyone give me some help? 2019-11-25 lk_spark

Re: how spark structrued stream write to kudu

2019-11-25 Thread lk_spark
I found _sqlContext is null , how to resolve it ? 2019-11-25 lk_spark 发件人:"lk_spark" 发送时间:2019-11-25 16:00 主题:how spark structrued stream write to kudu 收件人:"user.spark" 抄送: hi,all: I'm using spark 2.4.4 to readstream data from kafka and want to write to kudu

how to limit tasks num when read hive with orc

2019-11-11 Thread lk_spark
hi,all: I have a hive table STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' , many files of it is very small , when I use spark to read it , thousands tasks will start , how can I limit the task num ? 2019-11-12 lk_spark

Why NPE happen with multi threading in cluster mode but not client model

2020-12-02 Thread lk_spark
hi,all : I'm using spark2.4, I try to use multi thread to use sparkcontext , I found a example : https://hadoopist.wordpress.com/2017/02/03/how-to-use-threads-in-spark-job-to-achieve-parallel-read-and-writes/ some code like this : for (a <- 0 until 4) { val thread = new Thread {

Re:NoSuchMethodError: org.apache.spark.sql.execution.command.CreateViewCommand.copy

2022-03-21 Thread lk_spark
sorry, it's my env problem. At 2022-03-21 14:00:01, "lk_spark" wrote: hi, all : I got a strange error: bin/spark-shell --deploy-mode client Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLe

NoSuchMethodError: org.apache.spark.sql.execution.command.CreateViewCommand.copy

2022-03-21 Thread lk_spark
hi, all : I got a strange error: bin/spark-shell --deploy-mode client Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 22/03/21 13:51:39 WARN util.Utils: spark.executor.instances less than

Does 'Stage cancelled because SparkContext was shut down' is a error

2022-09-28 Thread lk_spark
hi,all : when I try to merge a iceberg table by spark , I can see faild job on spark ui , but the spark application final state is SUCCEEDED. I submit an issue : https://github.com/apache/iceberg/issues/5876 I wonder to know is this a real error ? thanks .

Re:Upgrading from Spark SQL 3.2 to 3.3 faild

2023-02-15 Thread lk_spark
, "lk_spark" wrote: hi,all : I have a sql statement wich can be run on spark 3.2.1 but not on spark 3.3.1 . when I try to explain it, will got error with message: org.apache.spark.sql.catalyst.expressions.Literal cannot be cast to org.apache.spark.sql.catalyst.expressions.AnsiCast execu

Upgrading from Spark SQL 3.2 to 3.3 faild

2023-02-15 Thread lk_spark
hi,all : I have a sql statement wich can be run on spark 3.2.1 but not on spark 3.3.1 . when I try to explain it, will got error with message: org.apache.spark.sql.catalyst.expressions.Literal cannot be cast to org.apache.spark.sql.catalyst.expressions.AnsiCast execute the sql, error stack is