Hey, the side field as defined in your mapping (I assume you use elasticsearch 0.90.X) uses the standard analyzer, which by default removes stopwords. As "a" is a stopword, it gets removed as part of the indexing process - and that makes it impossible to search for. In order to find out more about this, a good way is to play around with the analyze API. If you like a nice UI on top of that, go with the inquisitor plugin.
The analyze API basically tells you, how a string is tokenized and stored in the index, which parts are being removed or altered (due to stemming for example). See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html --Alex On Thu, Feb 6, 2014 at 3:38 AM, Bobby Richards <[email protected]>wrote: > So I have decided on using the week of year as the index and quotes as my > type. I want to clarfiy a couple of things that I am seeing. > > first I create my index *curl 'http://localhost:9200/2014_6/quotes > <http://localhost:9200/2014_6/quotes>'* > > then I set my mapping: > > *curl -XPUT 'http://localhost:9200/2014_6/quotes/_mapping > <http://localhost:9200/2014_6/quotes/_mapping>' -d '* > > *{* > > * "quotes" : {* > > * "properties" : {* > > * "time_stamp": {"type":"date"},* > > * "symbol": {"type":"string"},* > > * "side" : {"type":"string"},* > > * "price" : {"type":"double"}* > > * },* > > * "_routing" : {* > > * "required": true,* > > * "path":"symbol"* > > * },* > > * "_timestamp" : {* > > * "enabled" : true,* > > * "path": "time_stamp",* > > * "format": "date_hour_minute_second_millis"* > > * }* > > * }* > > *}* > > *'* > now because of this I understand when I am posting a new event to be > indexed I do not need to specify quote?routing=<symbol>. However my first > question is that now I must include symbol in the json object I am posting, > is this costing me more as far as storage? If I do not do this via the > mapping I have no problem adding the routing to the uri, especially if it > saves me space. > > second I am seeing a couple of weird things... > by running this: > *curl -XGET 'http://localhost:9200/2014_5/quotes/_search?routing=eurusd > <http://localhost:9200/2014_5/quotes/_search?routing=eurusd>'* > > i get the following, which is good, what I expect. > {"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"2014_5","_type":"quotes","_id":"ZW5u1nCHTGW-xToRy8Yy5g","_score":1.0, > "_source" : > { "time_stamp":1391653001000, "symbol":"eurusd", "side":"a", > "price":1.3456}},{"_index":"2014_5","_type":"quotes","_id":"ok4FLnrfR4u2CnJ3lVNKkg","_score":1.0, > "_source" : > { "time_stamp":1391653001000, "symbol":"eurusd", "side":"b", > "price":1.3457}},{"_index":"2014_5","_type":"quotes","_id":"1eG5m0riSoiDEquQ3I-QSA","_score":1.0, > "_source" : > { "time_stamp":1391653001100, "symbol":"eurusd", "side":"b", > "price":1.3458}}]}} > > however if you will notice the first entry is of side "a". by running the > following I get nothing. > *url -XGET 'http://localhost:9200/2014_5/quotes/_search?routing=eurusd > <http://localhost:9200/2014_5/quotes/_search?routing=eurusd>' -d '* > > *{"query":{"filtered":{"query":{"match_all":{}},"filter":{"term":{"side":"a"}}}}}'* > > however if I change side to "b" I get 2 as I would expect. Is there some > reserved feature that would limit me searching the a or is there some text > search thing I am not thinking about. > > Finally, I have added a few usdjpy quotes which are routed to a separate > shard. In my query I accidentally type *usejpy *and I got the two eurusd > events, even though it honored the side filter. > correcting the symbol I get what I would expect. Is this another text > search 'thing'? All I can think of is that by mistyping the e matches the > eur in the other indexed items. > > I just want to understand fully what I have going on there, thanks. > > > > > > > > On Saturday, February 1, 2014 2:27:55 PM UTC-6, Bobby Richards wrote: >> >> Wanting to get some advice on how to go about design. I have some >> currency market data and I get roughly 10 million events a week currently >> storing in postgres, it actually ends up being about 10 gigs, though I >> would like to work on getting this down obviously. The data is seldom >> queried but I have all of my other data in elastic search which I love. I >> am trying to determine the best way to store this. >> >> I would like to query by symbol and time and indexing by month so I can >> drop months whenever. i guess that would mean 'month/symbol/(unixtime for >> minute). >> >> I am far from a data guy, so I am looking for direction, thoughts, >> etc...is this even a good use case for elastic search? >> >> Thanks, >> Bobby >> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/24b53357-be8b-4401-95eb-3581765af41a%40googlegroups.com > . > > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9Vj-Mv3vQGQBipbR7c11cfrc2AZ_5PnVm%2BOS72DMuifg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
