I'm testing out Logstash and ElasticSearch on my local dev (Win 7) as a 
replacement for our current SQL Server based search pages.

I'm using the current Logstash config to import a folder full of CSV files 
(pipe delimited) into ElasticSearch:

-------------------------------
input {
  stdin {
    type => "stdin-type"
  }

  file {
    path => ["C:/Users/.../export.csv"]
  }
}

filter {
  csv {
    columns => 
["property_id","postal_code","status_id","address_1","city","state"]
    separator => "|"
  }
}

output {
  elasticsearch {
    embedded => true
    index => "assets"
    index_type => "asset"
  }
}
-------------------------------

1. Sometimes it imports, sometime it doesn't. I've deleted the .sincedb 
files over and over and have changed the index name to make sure it's going 
in correctly (when it actually runs the import). Any idea why it's sporadic?

2. I have a data set of over a million records.the "_id" value of each 
record in ES is, of course, a unique string. If I add a new CSV file with 
updates for 100 records, how does Logstash or ES know how to match an 
update to an existing record? In the original data set, the "property_id" 
value is the primary key.

I looked 
at http://logstash.net/docs/1.3.3/outputs/elasticsearch#document_id , which 
seems to be the correct setting for the import, but what value? I tried 
"property_id", the first column name, but that doesn't work. The import 
doesn't even run with that setting.

Any help would be appreciated. Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ca71517f-f950-4d63-9340-57acf35e45f6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to