Re: How to get aggregations working in Elasticsearch Spark adapter ?

michele crudele Wed, 01 Apr 2015 08:58:35 -0700

Thanks,

when is the 2.1 release coming?


Another question, which I think is related to this one btw... I was able to 
run this piece of code using facets:

val q5 = """
|{
|  "query": {
|    "match": { "_all": "error"}},
|  "facets":{
|    "appName": {"terms": {"field": "appName"}},
|    "sourceName": {"terms": {"field": "sourceName"}}}
|}
""".stripMargin


println("Query: " + q5)

val rdd = sc.esRDD("logs/app", q5);

What I get from the rdd are tuples (docID, Map[of the field=value]). Should 
I also expect to find facets ? If so, how do I get them ?


Il giorno mercoledì 1 aprile 2015 12:02:20 UTC+2, Costin Leau ha scritto:
>
> The short answer is that the connector relies on scan/scroll search for 
> its core functionality. And with aggs it needs 
> to switch the way it queries the cluster to a count search. 
> This is the last major feature that needs to be addressed before the 2.1 
> release. There's also an issue for it raised 
> here [1] which you can track. 
>
> Cheers, 
>
> [1] https://github.com/elastic/elasticsearch-hadoop/issues/276 
>
> On 4/1/15 12:53 PM, michele crudele wrote: 
> > 
> > I have ES, Spark, and ES hadoop adapter installed on my laptop. I wrote 
> a simple scala notebook to test ES adapter. 
> > Everything was fine until I started thinking at more sophisticated 
> features. This is the snippet that drives me crazy: 
> > 
> > %AddJar 
> file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.BUILD-SNAPSHOT.jar
>  
>
> > %AddJar 
> file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar
>  
>
> > 
> > import org.elasticsearch.spark.rdd._ 
> > 
> > val q2 = """{ 
> >      |"query" : { "term": { "appName": "console" } }, 
> >      |"aggregations": { 
> >      |  "unusual": { 
> >      |    "significant_terms": {"field": "pathname"} 
> >      |  } 
> >      |} 
> > |}""".stripMargin 
> > 
> > val res = sc.esRDD("logs/app", q2); 
> > 
> > println("Matches: " + res.count()) 
> > 
> > 
> > When I run the code I get this exception: 
> > 
> > Name: org.apache.spark.SparkException 
> > Message: Job aborted due to stage failure: Task 2 in stage 15.0 failed 1 
> times, most recent failure: Lost task 2.0 in stage 15.0 (TID 58, 
> localhost): org.apache.spark.util.TaskCompletionListenerException: 
> SearchPhaseExecutionException[Failed to execute phase [init_scan], all 
> shards failed; shardFailures {[N1R-UlgOQCGXCFCtbJ3sBQ][logrecords][2]: 
> ElasticsearchIllegalArgumentException[aggregations are not supported with 
> search_type=scan]}] 
> >         at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:76) 
>
> >         at org.apache.spark.scheduler.Task.run(Task.scala:58) 
> >         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) 
> >         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
>
> >         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
>
> >         at java.lang.Thread.run(Thread.java:745) 
> > 
> > 
> > "aggregations are not supported with search_type=scan", which is fine. 
> > The question is: how do I set search_type to the right value (e.g. 
> count) in the sc.esRDD() call? 
> > I tried several places in the q2 json with no success and I was not able 
> to find an answer through 
> > the documentation. I would appreciate any help. 
> > 
> > However, I see a possible inconsistency with the behaviour of the ES API 
> used directly via cURL. 
> > The command with the same query above, and without any setting about 
> search_type works correctly: 
> > 
> > curl 'localhost:9200/logs/app/_search?pretty' -d'{"query" : { "term": { 
> "appName": "console" } }, 
> > "aggregations": { "unusual": { "significant_terms": {"field": 
> "pathname"} }}}' 
> > 
> > returns hits:{} and aggregations:{}. Why the Spark integration does not 
> work the same ? 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to 
> > elasticsearc...@googlegroups.com <javascript:> <mailto:
> elasticsearch+unsubscr...@googlegroups.com <javascript:>>. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/d044d380-a4b2-4d22-8990-60f318f7601a%40googlegroups.com
>  
> > <
> https://groups.google.com/d/msgid/elasticsearch/d044d380-a4b2-4d22-8990-60f318f7601a%40googlegroups.com?utm_medium=email&utm_source=footer>.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>
> -- 
> Costin 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/63f36dfd-fdf7-46a8-b092-2df293b3d145%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to get aggregations working in Elasticsearch Spark adapter ?

Reply via email to