Thanks very much, Jörg, for your answer! I see the approach...

I understand well that having integrity checks in a schema-less engine like ElasticSearch isn't possible. However, would it be possible to have checks at field structure level before triggering indexing in the ES engine. Perhaps with specifying something like that in the mapping:

{
  "mappings": {
    "mydoc": {
      "properties": {
        (...)
        "name": {
          "type":"string", "store":"yes", "index":"analyzed",
          "checks": "not_null,not_empty,regexp=^[a-zA-Z]$"
         },
        (...)
      }
    }
  }
}

Thanks very much for your help!
Thierry

You can validate the data at client side in your model before serializing it to JSON, or after a complete bulk index run.

There are reasons why Elasticsarch is schema-less. It is equivalent to allow any number of different fields (keys) and any content in fields (values) without any logical constraints.

In a distributed system, commits per field, or transactions per field, or integrity checking can get very expensive. Because the index is inverted, and nodes can come and go, there is a significant penalty if you want document transaction safety and document integrity checks.

I validate data in ES with the help o a large scan/scroll over the docs after bulk indexing, by searching for IDs if they exist or not. This is different from integrity constraint checking techniques like rule based methos known from RDBMs.

Jörg



On Sat, Feb 15, 2014 at 10:40 PM, Thierry Templier <[email protected] <mailto:[email protected]>> wrote:

    Hello,

    I wonder if there is a built-in way to validate data before
    indexing them. I see two kinds of validation:

    * Structural validation of fields based on a regular expression
    for example. Perhaps something can be configured in the mapping...
    * Integrity validation of document. For example preventing from
    indexing a document with a field value that already exists.

    In the case where there is no built-in support at the moment, is
    there a way to extend ElasticSearch to add such processing before
    indexing using the standard REST calls?

    Thanks very much for your help!

-- You received this message because you are subscribed to the Google
    Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:elasticsearch%[email protected]>.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/elasticsearch/52FFDEC9.7020007%40gmail.com.
    For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGXXQxQ%2B5PRwrHw33uj3%2B8WwqLKiAZvnQrZ8bYUMfKYSw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5301B2C8.1060305%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to