Michael, did you ever solve this? I'm about to encounter this very same
issue and I'm looking into what solutions are available to me.
On Friday, September 19, 2014 6:21:32 AM UTC-4, Michael Chen wrote:
>
> Let me make the question more clear. The challenge we have now, is how to
> index a EAV[1] model database.
>
> Let's take Google Form as an example. Every user can create a form. They
> can choose from various field types including text, number, choice etc.
> They construct one form like this:
>
> Form 1: a survery
> - field_1: type=text
> - field_2: type=number
> - field_3: type=choice
>
> And people submit data entry into this form with data like:
>
> {
> field_1: "hello",
> field_2: 20,
> field_3: ["red"]
> }
>
> And you can imagine that all this data entries saved into one single mongo
> collection "entries".
>
> Well, the second user might create another form like this:
>
> Form 2: a questionare
> - field_1: type=number
> - field_2: type=text
> - field_3: type=number
> - field_4: type=text
>
> the data submission might like this:
>
> {
> field_1: 100,
> field_2: "hello questionare",
> field_3: 20,
> field_4: "this is my answer"
> }
>
> Indexing the second data entry while we have the first one in ES will
> throw NumberFormatException because field_2 was guessed by ES it should be
> number. Then the transforming all value into string make sense but...
>
> Any thoughts?
>
> [1]EAV: Entity–attribute–value model,
> http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
>
> On Fri, Sep 19, 2014 at 6:01 PM, Michael Chen <[email protected]
> <javascript:>> wrote:
>
>> We cannot guarantee that field_1 is always address. In Form 1, field_1
>> might be address while in another form it might be string or number
>> whatever. Thinking about designing the storage for Google Forms and it's
>> data entries.
>>
>> Re "you could force each field to be a String and do the transformation
>> at a client level."
>>
>> Forcing means serialize all data into a string right? In the example JSON
>> mentioned in previous email, it will transformed to something like
>>
>> { field_1: "\{country: \"US\", province: \"CA\", city: \"New York\",
>> address: \"Street Address\"\}" }
>>
>> Then not able to do the aggregation.
>>
>> On Fri, Sep 19, 2014 at 5:50 PM, David Pilato <[email protected]
>> <javascript:>> wrote:
>>
>>> I don't get it.
>>>
>>> If field_1.country is a String why you can not aggregate on it?
>>>
>>> --
>>> *David Pilato* | Technical Advocate | *elasticsearch.com
>>> <http://elasticsearch.com>*
>>> [email protected] <javascript:>
>>> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
>>> <https://twitter.com/elasticsearchfr> | @scrutmydocs
>>> <http://twitter.com/scrutmydocs>
>>> <https://twitter.com/scrutmydocs>
>>>
>>>
>>>
>>> Le 19 septembre 2014 à 08:27:19, Michael Chen ([email protected]
>>> <javascript:>) a écrit:
>>>
>>> Thanks David. Based on the system behavior, having all type as string is
>>> fine for queries. But for the aggregation level it might be trouble. For
>>> example a type of address is a complex JSON object:
>>>
>>> { field_1: { country: "US", province: "CA", city: "New York", address:
>>> "Street Address"} }
>>>
>>> If we transform this type into any form of string, and trying to
>>> aggregate based on country/state, it will be VERY hard even not possible.
>>>
>>> On Fri, Sep 19, 2014 at 2:15 PM, David Pilato <[email protected]
>>> <javascript:>> wrote:
>>>
>>>> You could have one type per form although the cluster state will be
>>>> very big.
>>>> But you should test that option.
>>>>
>>>> Or if you don't really search for numbers as numbers (I mean with
>>>> Range queries/filters), you could force each field to be a String and do
>>>> the transformation at a client level.
>>>>
>>>> My 2 cents
>>>>
>>>>
>>>> --
>>>> * David Pilato* | Technical Advocate | *elasticsearch.com
>>>> <http://elasticsearch.com>*
>>>> [email protected] <javascript:>
>>>> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
>>>> <https://twitter.com/elasticsearchfr> | @scrutmydocs
>>>> <http://twitter.com/scrutmydocs>
>>>> <https://twitter.com/scrutmydocs>
>>>>
>>>>
>>>>
>>>> Le 19 septembre 2014 à 05:31:38, Michael Chen ([email protected]
>>>> <javascript:>) a écrit:
>>>>
>>>> Hi,
>>>>
>>>> We have a system very much like Google Forms, which allow users to
>>>> design their own forms with various fields (single line text, paragraph,
>>>> number, address etc, you can imagine that.) Without any doubt, it's
>>>> running
>>>> on top of MongoDB. Now it has 120K forms with nearly 10 million entries.
>>>>
>>>> Recently we found a performance bottleneck on the query. After we have
>>>> done every possible performance tuning on mongoDB side, we decide to index
>>>> the form entries into elastic search. And there is a trouble:
>>>>
>>>> Given there is a Form A, has field_1 as string type, field_2 as number,
>>>> the data entry might look like: { field_1: "hello", field_2: 100}
>>>>
>>>> Form B could be field_1 as number, field_2 as number, the date entry
>>>> will look like { field_1: 100, field_2: "hello form"}
>>>>
>>>> We have successfully create an index "entries" in ES, and can index the
>>>> first entry successfully. But the second one failed for an obvious reason:
>>>> type mismatch.
>>>>
>>>> I am not sure how to deal with this problem. I definitely don't want to
>>>> create 120K indices for every single form. And I am not sure it's doable
>>>> to
>>>> write custom transform script to change the index type identical across
>>>> all
>>>> entries.
>>>>
>>>> Any suggestion? Much appreciate any response.
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected] <javascript:>.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected] <javascript:>.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local
>>>>
>>>> <https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer>.
>>>>
>>>>
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> --
>>> Michael Chen
>>> --------------------------------
>>> Blog: http://michael.nona.name
>>> GTalk/Twitter/Facebook/Yahoo/Skype: mechiland
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected] <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com
>>>
>>> <https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected] <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local
>>>
>>> <https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Michael Chen
>> --------------------------------
>> Blog: http://michael.nona.name
>> GTalk/Twitter/Facebook/Yahoo/Skype: mechiland
>>
>
>
>
> --
> Michael Chen
> --------------------------------
> Blog: http://michael.nona.name
> GTalk/Twitter/Facebook/Yahoo/Skype: mechiland
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0c7013de-7549-4717-a83c-17cc7b496f16%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.