garyli1019 commented on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632255322
We have been using HUDI to manage a data lake with 500+TB manufacturing data for almost a year now. In the IoT world, late arrival and update is a very common scenario and HUDI can handle it perfectly for us. We use Impala to query the data. The small file handling with easy partitioning feature of HUDI let us build an efficient structure to make the query on the fly. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org