Alexx-G opened a new issue #746:
URL: https://github.com/apache/arrow-rs/issues/746


   **Which part is this question about**
   library API and documentation
   
   **Describe your question**
   I'd like to extend a log forwarding adding support for writing log events in 
`parquet` format. The overall flow is rather simple - collect & process log 
events, batch & buffer processed log events, write batches as parquet files 
somewhere (e.g. AWS S3).
   There are a few problems that I'm not sure how should be solved:
   - same stream of log events may not have the same shape (e.g. some log 
events miss a field), thus a "schema-less" approach is highly desirable;
   - some structured logs may have nested data structures (e.g. request event 
with headers - a dictionary);
   - what's the best approach of buffering log events and serializing a batch 
to a parquet file
   
   I have little->moderate Rust experience and no parquet experience, so it's a 
bit challenging to get a clearer picture on the way forward. I'd highly 
appreciate some pointers/examples on how to approach this problem.
   
   **Additional context**
   I'm collecting info on how parquet format can be integrated into Vector (log 
forwarder written in Rust).
   https://github.com/timberio/vector/issues/1374


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to