[
https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mehant Baid resolved DRILL-2424.
--------------------------------
Resolution: Duplicate
Assignee: Mehant Baid (was: Steven Phillips)
> Ignore hidden files in directory path
> -------------------------------------
>
> Key: DRILL-2424
> URL: https://issues.apache.org/jira/browse/DRILL-2424
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - JSON, Storage - Text & CSV
> Affects Versions: 0.7.0
> Reporter: Andries Engelbrecht
> Assignee: Mehant Baid
> Fix For: 1.2.0
>
>
> When streaming data to the DFS some records can be incomplete during the
> temporary write phase for the last file(s). These file typically have a
> different extension like '.tmp' or can be marked hidden with a prefix of '.'
> .
> Querying the directory path will Drill will then cause a query error as some
> records may not be complete in the temporary files. Having the ability to
> have Drill ignore hidden files and/or to only read files of designated
> extension in the workspace will resolve this problem.
> Example is using Flume to stream JSON files to a directory structure, the
> HDFS sink creates .tmp files (can be hidden with . prefix) that contains
> incomplete JSON objects till the file is closed and the .tmp extension (or
> prefix) is removed. Attempting to query the directory structure with Drill
> then results in errors due to the incomplete JSON object(s) in the tmp files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)