Hi,
Every time I run a POST request using _update, I notice that any indexed
information I didn't put in _source appears to go missing.
Obviously, it would be ideal if I didn't have to store, for example, the
contents of a several-megabyte file in _source in order to keep it in my
record after calling the _update method on my index/mapping.
To start, here is the version info for elastic search:
{
"status" : 200,
"name" : "Feron",
"version" : {
"number" : "1.3.1",
"build_hash" : "2de6dc5268c32fb49b205233c138d93aaf772015",
"build_timestamp" : "2014-07-28T14:45:15Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
Here's my cluster health:
{
"cluster_name" : "my-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
A script for recreating the issue is attached. In it, I create a mapping and
save a record using the attachment plugin. The records correctly match searches
on a field in _source, a field excluded from _source, and within the content
(attachment) field (also excluded from source).
As soon as I make the POST request to …/_update searches against fields
excluded from _source return 0 hits.
Is the only solution to this to store all fields in _source if I plan on
calling _update on the record?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/341a24f2-aedf-4f5f-9a9e-1434b9ea1e62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
indexserver="example.org:9200"
indexname="sample"
curl -XDELETE "http://$indexserver/$indexname/losing_data/_mapping?pretty=1"
curl -XPUT "http://$indexserver/$indexname/losing_data/_mapping?pretty=1" -d '
{
"losing_data" : {
"_source" : {
"enabled": true,
"excludes" : [ "content", "not_sourced" ]
},
"properties" : {
"record_counts" : {
"type" : "nested",
"include_in_parent": true,
"properties" : {
"first_count" : {
"type" : "long"
},
"second_count" : {
"type" : "long"
}
}
},
"description" : {
"type" : "string"
},
"not_sourced" : {
"type" : "string"
},
"blarf" : {
"type": "string"
},
"content" : {
"type" : "attachment"
}
}
}
}
}
'
file_path='test-1.rtf' # test-1.rtf is an RTF file containing the phrase "This contains red"
file_content=`cat $file_path | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json='
{
"content" : "'${file_content}'",
"not_sourced": "Giraffe",
"description": "This is not about bears",
"record_counts": {
"first_count": 1,
"second_count": 100
}
}
'
echo "$json" > json.file
curl -XPUT "http://$indexserver/$indexname/losing_data/1" -d @json.file
curl "http://$indexserver/$indexname/losing_data/_search?q=red&pretty=1"
# should return one hit (from the file)
curl "http://$indexserver/$indexname/losing_data/_search?q=giraffe&pretty=1"
# should return one hit (from not_sourced field)
curl "http://$indexserver/$indexname/losing_data/_search?q=bears&pretty=1"
# should return one hit (from description)
curl "http://$indexserver/$indexname/losing_data/1?pretty=1"
# record_counts > first_count should be 1
curl -XPOST "http://$indexserver/$indexname/losing_data/1/_update" -d '{
"script": "ctx._source.record_counts.first_count += 1",
"lang": "groovy"
}'
curl "http://$indexserver/$indexname/losing_data/1/?pretty=1"
# record_counts > first_count should be 2
curl "http://$indexserver/$indexname/losing_data/_search?q=red&pretty=1"
# I get 0 hits
curl "http://$indexserver/$indexname/losing_data/_search?q=giraffe&pretty=1"
# I get 0 hits
curl "http://$indexserver/$indexname/losing_data/_search?q=bears&pretty=1"
# description is in source, so I still get a hit