[I] Can Apache Drill query a list of files with updated data? (drill)

via GitHub Mon, 29 Apr 2024 13:04:06 -0700


kevinlo opened a new issue, #2910:
URL: https://github.com/apache/drill/issues/2910

I am new to Apache Drill. I post the question in stackflow and don't get
an answer. So, I try it here to see if someone can answer the question. I
am sorry if that is not the right place.

I have a large (more than 8.5GB) CSV file that is updated on the first day
of each month. But from the 2nd to the last day of each month, it can have new
updated data in the JSON format. These JSON format data will be merged to the
CSV and become the new CSV on the first day of next month.

I convert the CSV to panquet and do the query in Apache Drill, it works
fine. But how can I query the big file with the updated file?

e.g. In the Apr 1st CSV file, it has

ID Name Value LastUpdatedTime
100 John 98 2024-01-05
In the Apr 15 JSON file, it has

ID Name Value LastUpdatedTime
100 John 100 2024-04-15
When it query all these files for ID = 100, it should give Value=100 as it
has newer LastUpdatedTime.

I find this
[post](https://stackoverflow.com/questions/48660704/update-insert-when-modifying-rdbmss-using-drill)
saying people use Drill on data that is no longer changing.

Is that true?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Can Apache Drill query a list of files with updated data? (drill)

Reply via email to