[
https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611149#comment-14611149
]
Chris Westin commented on DRILL-2424:
-------------------------------------
DRILL-1131 requests this as a feature, but this bug demonstrates that not
having it causes problems for queries that are run while temporary output files
are being used.
> Ignore hidden files in directory path
> -------------------------------------
>
> Key: DRILL-2424
> URL: https://issues.apache.org/jira/browse/DRILL-2424
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - JSON, Storage - Text & CSV
> Affects Versions: 0.7.0
> Reporter: Andries Engelbrecht
> Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> When streaming data to the DFS some records can be incomplete during the
> temporary write phase for the last file(s). These file typically have a
> different extension like '.tmp' or can be marked hidden with a prefix of '.'
> .
> Querying the directory path will Drill will then cause a query error as some
> records may not be complete in the temporary files. Having the ability to
> have Drill ignore hidden files and/or to only read files of designated
> extension in the workspace will resolve this problem.
> Example is using Flume to stream JSON files to a directory structure, the
> HDFS sink creates .tmp files (can be hidden with . prefix) that contains
> incomplete JSON objects till the file is closed and the .tmp extension (or
> prefix) is removed. Attempting to query the directory structure with Drill
> then results in errors due to the incomplete JSON object(s) in the tmp files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)