Let me make the question more clear. The challenge we have now, is how to
index a EAV[1] model database.
Let's take Google Form as an example. Every user can create a form. They
can choose from various field types including text, number, choice etc.
They construct one form like this:
Form 1: a survery
- field_1: type=text
- field_2: type=number
- field_3: type=choice
And people submit data entry into this form with data like:
{
field_1: "hello",
field_2: 20,
field_3: ["red"]
}
And you can imagine that all this data entries saved into one single mongo
collection "entries".
Well, the second user might create another form like this:
Form 2: a questionare
- field_1: type=number
- field_2: type=text
- field_3: type=number
- field_4: type=text
the data submission might like this:
{
field_1: 100,
field_2: "hello questionare",
field_3: 20,
field_4: "this is my answer"
}
Indexing the second data entry while we have the first one in ES will throw
NumberFormatException because field_2 was guessed by ES it should be
number. Then the transforming all value into string make sense but...
Any thoughts?
[1]EAV: Entity–attribute–value model,
http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
On Fri, Sep 19, 2014 at 6:01 PM, Michael Chen <[email protected]> wrote:
> We cannot guarantee that field_1 is always address. In Form 1, field_1
> might be address while in another form it might be string or number
> whatever. Thinking about designing the storage for Google Forms and it's
> data entries.
>
> Re "you could force each field to be a String and do the transformation
> at a client level."
>
> Forcing means serialize all data into a string right? In the example JSON
> mentioned in previous email, it will transformed to something like
>
> { field_1: "\{country: \"US\", province: \"CA\", city: \"New York\",
> address: \"Street Address\"\}" }
>
> Then not able to do the aggregation.
>
> On Fri, Sep 19, 2014 at 5:50 PM, David Pilato <[email protected]> wrote:
>
>> I don't get it.
>>
>> If field_1.country is a String why you can not aggregate on it?
>>
>> --
>> *David Pilato* | Technical Advocate | *elasticsearch.com
>> <http://elasticsearch.com>*
>> [email protected]
>> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
>> <https://twitter.com/elasticsearchfr> | @scrutmydocs
>> <http://twitter.com/scrutmydocs>
>> <https://twitter.com/scrutmydocs>
>>
>>
>>
>> Le 19 septembre 2014 à 08:27:19, Michael Chen ([email protected]) a
>> écrit:
>>
>> Thanks David. Based on the system behavior, having all type as string is
>> fine for queries. But for the aggregation level it might be trouble. For
>> example a type of address is a complex JSON object:
>>
>> { field_1: { country: "US", province: "CA", city: "New York", address:
>> "Street Address"} }
>>
>> If we transform this type into any form of string, and trying to
>> aggregate based on country/state, it will be VERY hard even not possible.
>>
>> On Fri, Sep 19, 2014 at 2:15 PM, David Pilato <[email protected]> wrote:
>>
>>> You could have one type per form although the cluster state will be
>>> very big.
>>> But you should test that option.
>>>
>>> Or if you don't really search for numbers as numbers (I mean with Range
>>> queries/filters), you could force each field to be a String and do the
>>> transformation at a client level.
>>>
>>> My 2 cents
>>>
>>>
>>> --
>>> * David Pilato* | Technical Advocate | *elasticsearch.com
>>> <http://elasticsearch.com>*
>>> [email protected]
>>> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
>>> <https://twitter.com/elasticsearchfr> | @scrutmydocs
>>> <http://twitter.com/scrutmydocs>
>>> <https://twitter.com/scrutmydocs>
>>>
>>>
>>>
>>> Le 19 septembre 2014 à 05:31:38, Michael Chen ([email protected]) a
>>> écrit:
>>>
>>> Hi,
>>>
>>> We have a system very much like Google Forms, which allow users to
>>> design their own forms with various fields (single line text, paragraph,
>>> number, address etc, you can imagine that.) Without any doubt, it's running
>>> on top of MongoDB. Now it has 120K forms with nearly 10 million entries.
>>>
>>> Recently we found a performance bottleneck on the query. After we have
>>> done every possible performance tuning on mongoDB side, we decide to index
>>> the form entries into elastic search. And there is a trouble:
>>>
>>> Given there is a Form A, has field_1 as string type, field_2 as number,
>>> the data entry might look like: { field_1: "hello", field_2: 100}
>>>
>>> Form B could be field_1 as number, field_2 as number, the date entry
>>> will look like { field_1: 100, field_2: "hello form"}
>>>
>>> We have successfully create an index "entries" in ES, and can index the
>>> first entry successfully. But the second one failed for an obvious reason:
>>> type mismatch.
>>>
>>> I am not sure how to deal with this problem. I definitely don't want to
>>> create 120K indices for every single form. And I am not sure it's doable to
>>> write custom transform script to change the index type identical across all
>>> entries.
>>>
>>> Any suggestion? Much appreciate any response.
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local
>>> <https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer>.
>>>
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Michael Chen
>> --------------------------------
>> Blog: http://michael.nona.name
>> GTalk/Twitter/Facebook/Yahoo/Skype: mechiland
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local
>> <https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Michael Chen
> --------------------------------
> Blog: http://michael.nona.name
> GTalk/Twitter/Facebook/Yahoo/Skype: mechiland
>
--
Michael Chen
--------------------------------
Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWiB3Z%3D_wtzrqtqRObPLucp_SULNjYeOVPbFs8pb-tPETg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.