The data ingestion is in outermost portion in foreachRDD block. Although
now I close the statement of jdbc, the same exception happened again. It
seems it is not related to the data ingestion part.
On Wed, Apr 29, 2015 at 8:35 PM, Cody Koeninger c...@koeninger.org wrote:
Use lsof to see what
Did you use lsof to see what files were opened during the job?
On Thu, Apr 30, 2015 at 1:05 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
The data ingestion is in outermost portion in foreachRDD block. Although
now I close the statement of jdbc, the same exception happened again. It
seems it
I terminated the old job and now start a new one. Currently, the Spark
streaming job has been running for 2 hours and when I use lsof, I do not
see many files related to the Spark job.
BTW, I am running Spark streaming using local[2] mode. The batch is 5
seconds and it has around 50 lines to read
Thanks for the suggestion. I ran the command and the limit is 1024.
Based on my understanding, the connector to Kafka should not open so many
files. Do you think there is possible socket leakage? BTW, in every batch
which is 5 seconds, I output some results to mysql:
def ingestToMysql(data:
Can you run the command 'ulimit -n' to see the current limit ?
To configure ulimit settings on Ubuntu, edit */etc/security/limits.conf*
Cheers
On Wed, Apr 29, 2015 at 2:07 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi all,
I am using the direct approach to receive real-time data from
Maybe add statement.close() in finally block ?
Streaming / Kafka experts may have better insight.
On Wed, Apr 29, 2015 at 2:25 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Thanks for the suggestion. I ran the command and the limit is 1024.
Based on my understanding, the connector to Kafka
Hi all,
I am using the direct approach to receive real-time data from Kafka in the
following link:
https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
My code follows the word count direct example:
Is the function ingestToMysql running on the driver or on the executors?
Accordingly you can try debugging while running in a distributed manner,
with and without calling the function.
If you dont get too many open files without calling ingestToMysql(), the
problem is likely to be in
This function is called in foreachRDD. I think it should be running in the
executors. I add the statement.close() in the code and it is running. I
will let you know if this fixes the issue.
On Wed, Apr 29, 2015 at 4:06 PM, Tathagata Das t...@databricks.com wrote:
Is the function ingestToMysql
Also cc;ing Cody.
@Cody maybe there is a reason for doing connection pooling even if there is
not performance difference.
TD
On Wed, Apr 29, 2015 at 4:06 PM, Tathagata Das t...@databricks.com wrote:
Is the function ingestToMysql running on the driver or on the executors?
Accordingly you can
Use lsof to see what files are actually being held open.
That stacktrace looks to me like it's from the driver, not executors.
Where in foreach is it being called? The outermost portion of foreachRDD
runs in the driver, the innermost portion runs in the executors. From the
docs:
11 matches
Mail list logo