Hello everyone,

I have a lot of files with a lot of short lines (~45000 per file). The 
lines consist in a keyword and some additional data
I store each file and its metadata as objects in {index: "default", _type: 
"file", id: filename, _source: {various metadata} }
I store each line as children of my files:
        body_mapping = {"line": {
            "_parent" :{
               "type" :"file"
               }
            }
        }

{"_index": "default",
    "_type": "line",
    "_id": line_number,
    "_parent": filename
    "_source": {"keyword": keyword,
                     "metadata"}
     }

My goal is to search accross all my files by keyword {"query":
                                                                          
 {"query_string":
                                                                            
{"query" : keyword,
                                                                            
 "fields" : ["keyword"]
                                                                            
 }
                                                                            
}
But there is more to it: I want to search a bunch of keywords from a given 
file (all lines from an existing file or a new one) and aggregate the 
results by filename.
For example, the result would be:
{filename1: [{keyword: my_search_keyword, 
metadata_for_this_keyword_in_file1, _id: line_number}, 
{keyword: my_search_keyword, metadata_for_this_keyword_in_file1, _id: 
line_number}, ...],
filename2: [{keyword: my_search_keyword, 
metadata_for_this_keyword_in_file2, _id: 
line_number}, keyword: {my_search_keyword, 
metadata_for_this_keyword_in_file2, _id: line_number}, ...],
filename5: [{keyword: my_search_keyword, 
metadata_for_this_keyword_in_file5, _id: line_number}, 
{keyword: my_search_keyword, metadata_for_this_keyword_in_file5, _id: 
line_number}, ...],
}

Important point:  There are a lot of collisions, keyword-wise.

At the moment I am using elasticsearch-py with the es.msearch function. My 
query is mentioned above. However this is quite slow, so I suspect that 
either my object design, mapping, or search strategy are wrong.

Would you have an insight to give? Thanks a lot!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39410826-61e2-4b31-8e17-72358c5a6ed6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to