date:20190124

Dose --py-files place the files on the PYTHONPATH of executor?

2019-01-24 Thread thinkdoom2

I use spark-submint --help, and find: --jars: driver,executor classpath --files --archives: working directory of each executor. But for --py-files: --py-files: place on the PYTHONPATH for Python apps. It hasn't describe whether it is placed on executor(it is sure whill be placed on driver). So

Re: Structured streaming from Kafka by timestamp

2019-01-24 Thread Shixiong(Ryan) Zhu

Hey Tomas, >From your description, you just ran a batch query rather than a Structured Streaming query. The Kafka data source doesn't support filter push down right now. But that's definitely doable. One workaround here is setting proper "startingOffsets" and "endingOffsets" options when loading

Re: How to force-quit a Spark application?

2019-01-24 Thread Marcelo Vanzin

Hi, On Tue, Jan 22, 2019 at 11:30 AM Pola Yao wrote: > "Thread-1" #19 prio=5 os_prio=0 tid=0x7f9b6828e800 nid=0x77cb waiting on > condition [0x7f9a123e3000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for

Re: Structured streaming from Kafka by timestamp

2019-01-24 Thread Gabor Somogyi

Hi Tomas, As a general note don't fully understand your use-case. You've mentioned structured streaming but your query is more like a one-time SQL statement. Kafka doesn't support predicates how it's integrated with spark. What can be done from spark perspective is to look for an offset for a

Structured streaming from Kafka by timestamp

2019-01-24 Thread Tomas Bartalos

Hello, I'm trying to read Kafka via spark structured streaming. I'm trying to read data within specific time range: select count(*) from kafka_table where timestamp > cast('2019-01-23 1:00' as TIMESTAMP) and timestamp < cast('2019-01-23 1:01' as TIMESTAMP); The problem is that timestamp query

Re: Reading compacted Kafka topic is slow

2019-01-24 Thread Gabor Somogyi

Hi Tomas, Presume the 60 sec window means trigger interval. Maybe a quick win could be to try structured streaming because there the trigger interval is optional. If it is not specified, the system will check for availability of new data as soon as the previous processing has completed. BR, G

Reading compacted Kafka topic is slow

2019-01-24 Thread Tomas Bartalos

Hello Spark folks, I'm reading compacted Kafka topic with spark 2.4, using direct stream - KafkaUtils.createDirectStream(...). I have configured necessary options for compacted stream, so its processed with CompactedKafkaRDDIterator. It works well, however in case of many gaps in the topic, the

Fwd: unsubscribe

2019-01-24 Thread Anahita Talebi

unsubscribe

unsubscribe

2019-01-24 Thread neeraj bhadani

unsubscribe

答复: Re: How to get all input tables of a SPARK SQL 'select' statement

2019-01-24 Thread luby

Thanks all for your help. I'll try your suggestions. Thanks again :) 发件人: "Shahab Yunus" 收件人: "Ramandeep Singh Nanda" 抄送: "Tomas Bartalos" , l...@china-inv.cn, "user @spark/'user @spark'/spark users/user@spark" 日期: 2019/01/24 06:45 主题: Re: How to get all input tables of a SPARK SQL

Dose --py-files place the files on the PYTHONPATH of executor?

Re: Structured streaming from Kafka by timestamp

Re: How to force-quit a Spark application?

Re: Structured streaming from Kafka by timestamp

Structured streaming from Kafka by timestamp

Re: Reading compacted Kafka topic is slow

Reading compacted Kafka topic is slow

Fwd: unsubscribe

unsubscribe

答复: Re: How to get all input tables of a SPARK SQL 'select' statement

10 matches

Site Navigation

Mail list logo

Footer information