Thanks Kay,

Have you implemented something similar before? 
My apologies, I have so many questions and assumptions - I will be 
extremely grateful if you can help me out.

Could you please elaborate on the following:
- Where should the scroll search logic run? (Java app?Unix script?)
- How often does the scroll search run?
- How do we handle incremental exports and keep track of what's already 
been exported, or do we just run the export job on 
non-deflector("read-only") indexes when the deflector is cycled?
- What should the output of the scroll search be? (File?Json?CSV?Network?)
- How is the output from the scroll search written/imported to Hive? 
- Do we just dump Json files on HDFS and then use a Json SerDe? (...wild 
assumption based on "speed-reading" style research:) )
- Can HCat pic up schema dynamically from Json or do we need to manually 
create tables?

Many thanks!

Chris



On Thursday, February 27, 2014 2:47:52 PM UTC+2, Kay Röpke wrote:
>
> Hi!
>
> The easiest way is to use the scroll feature of elasticsearch:
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-request-scroll.html
>
> That way you can iterate over all the documents in indices and write them 
> to Hive.
> We don't have a built-in way to perform archiving yet, but this should 
> solve your immediate problem with minimal effort and impact.
>
> Best,
> Kay
>
> On Thursday, February 27, 2014 9:36:59 AM UTC+1, ChrisDK wrote:
>>
>> Hi Guys,
>>
>> We have a requirement to archive our Graylog2( v0.20.1) data into Hive. 
>> With a 400 million cap we currently keep only a couple of weeks' data, 
>> where the requirement is 36 months.
>>
>> Ideally these exports should run near real-time, not batched as nightly 
>> exports.
>> It should also have a minimal impact on our live ElasticSearch cluster.
>>
>> What would be the best way to do this?
>>
>> Thanks!
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to