Re: RE: ElasticSearch for Spark times out

Nick Pentreath Wed, 22 Apr 2015 22:37:08 -0700

Is your ES cluster reachable from your Spark cluster via network / firewall? 
Can you run the same query from the spark master and slave nodes via curl / one 
of the other clients?





Seems odd that GC issues would be a problem from the scan but not when running 
query from a browser plugin... Sounds like it could be a network issue.



—
Sent from Mailbox

On Thu, Apr 23, 2015 at 5:11 AM, Otis Gospodnetic
<otis.gospodne...@gmail.com> wrote:

> Hi,
> If you get ES response back in 1-5 seconds that's pretty slow.  Are these
> ES aggregation queries?  Costin may be right about GC possibly causing
> timeouts.  SPM <http://sematext.com/spm/> can give you all Spark and all
> key Elasticsearch metrics, including various JVM metrics.  If the problem
> is GC, you'll see it.  If you monitor both Spark side and ES side, you
> should be able to find some correlation with SPM.
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> On Wed, Apr 22, 2015 at 5:43 PM, Costin Leau <costin.l...@gmail.com> wrote:
>> Hi,
>>
>> First off, for Elasticsearch questions is worth pinging the Elastic
>> mailing list as that is closer monitored than this one.
>>
>> Back to your question, Jeetendra is right that the exception indicates
>> nodata is flowing back to the es-connector and
>> Spark.
>> The default is 1m [1] which should be more than enough for a typical
>> scenario. As a side note the scroll size is 50 per
>> tasks
>> (so 150 suggests 3 tasks).
>>
>> Once the query is made, scrolling the document is fast - likely there's
>> something else at hand that causes the
>> connection to timeout.
>> In such cases, you can enable logging on the REST package and see what
>> type of data transfer occurs between ES and Spark.
>>
>> Do note that if a GC occurs, that can freeze Elastic (or Spark) which
>> might trigger the timeout. Consider monitoring
>> Elasticsearch during
>> the query and see whether anything jumps - in particular the memory
>> pressure.
>>
>> Hope this helps,
>>
>> [1]
>> http://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html#_network
>>
>> On 4/22/15 10:44 PM, Adrian Mocanu wrote:
>>
>>> Hi
>>>
>>> Thanks for the help. My ES is up.
>>>
>>> Out of curiosity, do you know what the timeout value is? There are
>>> probably other things happening to cause the timeout;
>>> I don’t think my ES is that slow but it’s possible that ES is taking too
>>> long to find the data. What I see happening is
>>> that it uses scroll to get the data from ES; about 150 items at a
>>> time.Usual delay when I perform the same query from a
>>> browser plugin ranges from 1-5sec.
>>>
>>> Thanks
>>>
>>> *From:*Jeetendra Gangele [mailto:gangele...@gmail.com]
>>> *Sent:* April 22, 2015 3:09 PM
>>> *To:* Adrian Mocanu
>>> *Cc:* u...@spark.incubator.apache.org
>>> *Subject:* Re: ElasticSearch for Spark times out
>>>
>>> Basically ready timeout means hat no data arrived within the specified
>>> receive timeout period.
>>>
>>> Few thing I would suggest
>>>
>>> 1.are your ES cluster Up and running?
>>>
>>> 2. if 1 is yes then reduce the size of the Index make it few kbps and
>>> then test?
>>>
>>> On 23 April 2015 at 00:19, Adrian Mocanu <amoc...@verticalscope.com
>>> <mailto:amoc...@verticalscope.com>> wrote:
>>>
>>> Hi
>>>
>>> I use the ElasticSearch package for Spark and very often it times out
>>> reading data from ES into an RDD.
>>>
>>> How can I keep the connection alive (why doesn’t it? Bug?)
>>>
>>> Here’s the exception I get:
>>>
>>> org.elasticsearch.hadoop.serialization.EsHadoopSerializationException:
>>> java.net.SocketTimeoutException: Read timed out
>>>
>>>                  at
>>> org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:86)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> org.elasticsearch.hadoop.serialization.ParsingUtils.doSeekToken(ParsingUtils.java:70)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> org.elasticsearch.hadoop.serialization.ParsingUtils.seek(ParsingUtils.java:58)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:149)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:102)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:81)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:314)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>> ~[scala-library.jar:na]
>>>
>>>                  at
>>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>> ~[scala-library.jar:na]
>>>
>>>                  at
>>> scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>>> ~[scala-library.jar:na]
>>>
>>>                  at
>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>> ~[scala-library.jar:na]
>>>
>>>                  at
>>> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>> ~[scala-library.jar:na]
>>>
>>>                  at
>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>> ~[scala-library.jar:na]
>>>
>>>                  at
>>> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
>>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>>
>>>                  at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>>
>>>                  at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>>
>>>                  at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>>
>>>                  at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>>
>>>                  at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> [na:1.7.0_75]
>>>
>>>                  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> [na:1.7.0_75]
>>>
>>>                  at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
>>>
>>> Caused by: java.net.SocketTimeoutException: Read timed out
>>>
>>>                  at java.net.SocketInputStream.socketRead0(Native Method)
>>> ~[na:1.7.0_75]
>>>
>>>                  at
>>> java.net.SocketInputStream.read(SocketInputStream.java:152) ~[na:1.7.0_75]
>>>
>>>                  at
>>> java.net.SocketInputStream.read(SocketInputStream.java:122) ~[na:1.7.0_75]
>>>
>>>                  at
>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>>> ~[na:1.7.0_75]
>>>
>>>                  at
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>> ~[na:1.7.0_75]
>>>
>>>                  at
>>> org.apache.commons.httpclient.WireLogInputStream.read(WireLogInputStream.java:69)
>>> ~[commons-httpclient-3.1.jar:na]
>>>
>>>                  at
>>> org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:170)
>>> ~[commons-httpclient-3.1.jar:na]
>>>
>>>                  at
>>> java.io.FilterInputStream.read(FilterInputStream.java:133) ~[na:1.7.0_75]
>>>
>>>                  at
>>> org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:108)
>>> ~[commons-httpclient-3.1.jar:na]
>>>
>>>                  at
>>> org.elasticsearch.hadoop.rest.DelegatingInputStream.read(DelegatingInputStream.java:57)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  at
>>> org.codehaus.jackson.impl.Utf8StreamParser.loadMore(Utf8StreamParser.java:172)
>>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>>
>>>                  at
>>> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1502)
>>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>>
>>>                  at
>>> org.codehaus.jackson.impl.Utf8StreamParser.slowParseFieldName(Utf8StreamParser.java:1404)
>>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>>
>>>                  at
>>> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1231)
>>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>>
>>>                  at
>>> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
>>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>>
>>>                  at
>>> org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:84)
>>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>>
>>>                  ... 22 common frames omitted
>>>
>>>
>>>
>>>
>> --
>> Costin
>>
>>
>>
>>
>> --
>> Costin
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

Re: RE: ElasticSearch for Spark times out

Reply via email to