Re: design advice for market data

Alexander Reelsen Thu, 06 Feb 2014 02:34:48 -0800

Hey,

the side field as defined in your mapping (I assume you use elasticsearch
0.90.X) uses the standard analyzer, which by default removes stopwords. As
"a" is a stopword, it gets removed as part of the indexing process - and
that makes it impossible to search for. In order to find out more about
this, a good way is to play around with the analyze API. If you like a nice
UI on top of that, go with the inquisitor plugin.


The analyze API basically tells you, how a string is tokenized and stored
in the index, which parts are being removed or altered (due to stemming for
example).

See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html


--Alex


On Thu, Feb 6, 2014 at 3:38 AM, Bobby Richards <[email protected]>wrote:

> So I have decided on using the week of year as the index and quotes as my
> type.  I want to clarfiy a couple of things that I am seeing.
>
> first I create my index *curl 'http://localhost:9200/2014_6/quotes
> <http://localhost:9200/2014_6/quotes>'*
>
> then I set my mapping:
>
> *curl -XPUT 'http://localhost:9200/2014_6/quotes/_mapping
> <http://localhost:9200/2014_6/quotes/_mapping>' -d '*
>
> *{*
>
> *  "quotes" : {*
>
> *     "properties" : {*
>
> *        "time_stamp": {"type":"date"},*
>
> *        "symbol": {"type":"string"},*
>
> *        "side" : {"type":"string"},*
>
> *        "price" : {"type":"double"}*
>
> *     },*
>
> *    "_routing" : {*
>
> *       "required": true,*
>
> *      "path":"symbol"*
>
> *   },*
>
> *     "_timestamp" : {*
>
> *        "enabled" : true,*
>
> *        "path":  "time_stamp",*
>
> *        "format": "date_hour_minute_second_millis"*
>
> *     }*
>
> *  }*
>
> *}*
>
> *'*
> now because of this I understand when I am posting a new event to be
> indexed I do not need to specify quote?routing=<symbol>.  However my first
> question is that now I must include symbol in the json object I am posting,
> is this costing me more as far as storage?  If I do not do this via the
> mapping I have no problem adding the routing to the uri, especially if it
> saves me space.
>
> second I am seeing a couple of weird things...
> by running this:
> *curl -XGET 'http://localhost:9200/2014_5/quotes/_search?routing=eurusd
> <http://localhost:9200/2014_5/quotes/_search?routing=eurusd>'*
>
> i get the following, which is good, what I expect.
> {"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"2014_5","_type":"quotes","_id":"ZW5u1nCHTGW-xToRy8Yy5g","_score":1.0,
> "_source" :
> { "time_stamp":1391653001000, "symbol":"eurusd", "side":"a",
> "price":1.3456}},{"_index":"2014_5","_type":"quotes","_id":"ok4FLnrfR4u2CnJ3lVNKkg","_score":1.0,
> "_source" :
> { "time_stamp":1391653001000, "symbol":"eurusd", "side":"b",
> "price":1.3457}},{"_index":"2014_5","_type":"quotes","_id":"1eG5m0riSoiDEquQ3I-QSA","_score":1.0,
> "_source" :
> { "time_stamp":1391653001100, "symbol":"eurusd", "side":"b",
> "price":1.3458}}]}}
>
> however if you will notice the first entry is of side "a".  by running the
> following I get nothing.
> *url -XGET 'http://localhost:9200/2014_5/quotes/_search?routing=eurusd
> <http://localhost:9200/2014_5/quotes/_search?routing=eurusd>' -d '*
>
> *{"query":{"filtered":{"query":{"match_all":{}},"filter":{"term":{"side":"a"}}}}}'*
>
> however if I change side to "b" I get 2 as I would expect.  Is there some
> reserved feature that would limit me searching the a or is there some text
> search thing I am not thinking about.
>
> Finally, I have added a few usdjpy quotes which are routed to a separate
> shard. In my query I accidentally type *usejpy *and I got the two eurusd
> events, even though it honored the side filter.
> correcting the symbol I get what I would expect.  Is this another text
> search 'thing'?  All I can think of is that by mistyping the e matches the
> eur in the other indexed items.
>
> I just want to understand fully what I have going on there, thanks.
>
>
>
>
>
>
>
> On Saturday, February 1, 2014 2:27:55 PM UTC-6, Bobby Richards wrote:
>>
>> Wanting to get some advice on how to go about design.  I have some
>> currency market data and I get roughly 10 million events a week currently
>> storing in postgres, it actually ends up being about 10 gigs, though I
>> would like to work on getting this down obviously.  The data is seldom
>> queried but I have all of my other data in elastic search which I love.  I
>> am trying to determine the best way to store this.
>>
>> I would like to query by symbol and time and indexing by month so I can
>> drop months whenever.  i guess that would mean 'month/symbol/(unixtime for
>> minute).
>>
>> I am far from a data guy, so I am looking for direction, thoughts,
>> etc...is this even a good use case for elastic search?
>>
>> Thanks,
>> Bobby
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/24b53357-be8b-4401-95eb-3581765af41a%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9Vj-Mv3vQGQBipbR7c11cfrc2AZ_5PnVm%2BOS72DMuifg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: design advice for market data

Reply via email to