Andries Engelbrecht created DRILL-2424:
------------------------------------------

             Summary: Ignore hidden files in directory path
                 Key: DRILL-2424
                 URL: https://issues.apache.org/jira/browse/DRILL-2424
             Project: Apache Drill
          Issue Type: Improvement
          Components: Storage - JSON, Storage - Text & CSV
    Affects Versions: 0.7.0
            Reporter: Andries Engelbrecht
            Assignee: Steven Phillips


When streaming data to the DFS some records can be incomplete during the 
temporary write phase for the last file(s). These file typically have a 
different extension like '.tmp' or can be marked hidden with a prefix of '.'  .

Querying the directory path will Drill will then cause a query error as some 
records may not be complete in the temporary files. Having the ability to have 
Drill ignore hidden files and/or to only read files of designated extension in 
the workspace will resolve this problem.

Example is using Flume to stream JSON files to a directory structure, the HDFS 
sink creates .tmp files (can be hidden with . prefix) that contains incomplete 
JSON objects till the file is closed and the .tmp extension (or prefix) is 
removed. Attempting to query the directory structure with Drill then results in 
errors due to the incomplete JSON object(s) in the tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to