Hi all,

I have a PIG table hosting tomcat logs with 3 fields : log_date, log_url, 
log_nb. 
I want to store this in ES with an index on log_url, each document having 
an array of nested maps for each day : 


{ "log_url": "http://www.xxx.fr/index.html";,
  "log_hits": [
     { 
         "log_nb": 1, 
         "log_date": "20150406"
     } ,
     { 
         "log_nb": 2, 
         "log_date": "20150407"
     } 
   ]
} 

This script will be run everyday, generating a new entry for each url. So 
for a given log_url, the array will grow 1 element each day

as stated in the es-hadoop documentation 
http://www.elastic.co/guide/en/elasticsearch/hadoop/current/pig.html#tuple-names,
 
if we set es.mapping.pig.tuple.use.field.names (by default false) to true, 
tuples will be considered as array of maps when storing into ES.

The PIG code looks like : 

b = LOAD ......
c = group b BY log_url;
d = FOREACH c GENERATE
   group AS log_url,
   TOTUPLE (log_date, log_nb) AS log_hits;

store d into 'myindex/myindex' 
using org.elasticsearch.hadoop.pig.EsStorage (
    'es.mapping.pig.tuple.use.field.names=true', 
    'es.write.operation=upsert', 
    'es.mapping.id=log_url', 
);


First time I launch it, it creates the following record : 

{ "log_url": "http://www.xxx.fr/index.html";,
  "log_hits": [
     { 
         "log_nb": 1, 
         "log_date": "20150406"
     }  
   ]
} 

so far, so good, 

when run again with a new date (say "20150407"), instead of inserting a new 
entry in the embedded array "log_hits", it will replace its single array 
element and the ES document will become : 

{ "log_url": "http://www.xxx.fr/index.html";,
  "log_hits": [
     { 
         "log_nb": 2, 
         "log_date": "20150407"
     }  
   ]
}

I was expecting to get 

{ "log_url": "http://www.xxx.fr/index.html";,
  "log_hits": [
     { 
         "log_nb": 1, 
         "log_date": "20150406"
     } ,
     { 
         "log_nb": 2, 
         "log_date": "20150407"
     } 
   ]
} 

Is there a way to achieve that ?

thanks

Philippe




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f62ee6fb-9efc-4fd8-82f7-a3b6cf594758%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to