Dose --py-files place the files on the PYTHONPATH of executor?

2019-01-24 Thread thinkdoom2
I use spark-submint --help, and find: --jars: driver,executor classpath --files --archives: working directory of each executor. But for --py-files: --py-files: place on the PYTHONPATH for Python apps. It hasn't describe whether it is placed on executor(it is sure whill be placed on driver). So

Re: Structured streaming from Kafka by timestamp

2019-01-24 Thread Shixiong(Ryan) Zhu
Hey Tomas, >From your description, you just ran a batch query rather than a Structured Streaming query. The Kafka data source doesn't support filter push down right now. But that's definitely doable. One workaround here is setting proper "startingOffsets" and "endingOffsets" options when loading

Re: How to force-quit a Spark application?

2019-01-24 Thread Marcelo Vanzin
Hi, On Tue, Jan 22, 2019 at 11:30 AM Pola Yao wrote: > "Thread-1" #19 prio=5 os_prio=0 tid=0x7f9b6828e800 nid=0x77cb waiting on > condition [0x7f9a123e3000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for

Re: Structured streaming from Kafka by timestamp

2019-01-24 Thread Gabor Somogyi
Hi Tomas, As a general note don't fully understand your use-case. You've mentioned structured streaming but your query is more like a one-time SQL statement. Kafka doesn't support predicates how it's integrated with spark. What can be done from spark perspective is to look for an offset for a

Structured streaming from Kafka by timestamp

2019-01-24 Thread Tomas Bartalos
Hello, I'm trying to read Kafka via spark structured streaming. I'm trying to read data within specific time range: select count(*) from kafka_table where timestamp > cast('2019-01-23 1:00' as TIMESTAMP) and timestamp < cast('2019-01-23 1:01' as TIMESTAMP); The problem is that timestamp query

Re: Reading compacted Kafka topic is slow

2019-01-24 Thread Gabor Somogyi
Hi Tomas, Presume the 60 sec window means trigger interval. Maybe a quick win could be to try structured streaming because there the trigger interval is optional. If it is not specified, the system will check for availability of new data as soon as the previous processing has completed. BR, G

Reading compacted Kafka topic is slow

2019-01-24 Thread Tomas Bartalos
Hello Spark folks, I'm reading compacted Kafka topic with spark 2.4, using direct stream - KafkaUtils.createDirectStream(...). I have configured necessary options for compacted stream, so its processed with CompactedKafkaRDDIterator. It works well, however in case of many gaps in the topic, the

Fwd: unsubscribe

2019-01-24 Thread Anahita Talebi
unsubscribe

unsubscribe

2019-01-24 Thread neeraj bhadani
unsubscribe

答复: Re: How to get all input tables of a SPARK SQL 'select' statement

2019-01-24 Thread luby
Thanks all for your help. I'll try your suggestions. Thanks again :) 发件人: "Shahab Yunus" 收件人: "Ramandeep Singh Nanda" 抄送: "Tomas Bartalos" , l...@china-inv.cn, "user @spark/'user @spark'/spark users/user@spark" 日期: 2019/01/24 06:45 主题: Re: How to get all input tables of a SPARK SQL