benj created DRILL-7219:
---------------------------

             Summary: Ignore hidden file problems
                 Key: DRILL-7219
                 URL: https://issues.apache.org/jira/browse/DRILL-7219
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - JSON, Storage - Parquet, Storage - Text & 
CSV
    Affects Versions: 1.15.0
            Reporter: benj


Drill seems to use different filtering rules for files depending on the type.
 * *Parquet*: filtering hidden file (starting with ".") +whether+ we request 
the directory or the files with *
{code:java}
/* DirPqt
   |--sub1.pqt
   |--sub2.pqt
   |--.sub3.pqt
*/
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt`);
=> 2
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt/*`);
=> 2
/* Its possible to request the hidden file */
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt/.*`);
=> 1
/* But don't know how to request visible and hidden simultaneously (except to 
do an union) */
{code}

 * *CSV, json*: filtering hidden file (starting with ".") +depends+ if the 
request is on directory or files
{code:java}
/* DirCSVH
   |--sub1.csvh
   |--sub2.csvh
   |--.sub3.csvh
*/
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH`);
=> 2
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/*`);
=> 3
/* Like for Parquet, its possible to request the hidden file*/
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/.*`);
=>1
/* It's also possible to request only visible */
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/[^.]*`);
=>2
/* But don't know how to request visible and hidden simultaneously (except to 
do an union)*/
{code}

Some issue are about the problematic of hidden files, example : DRILL-2424
But don't found any precision of this filtering in the documentation. I found 
that hidden file start with "." or "_" but maybe there are other case ?  

It's a little bit strange to not have the same filtering rules depending of the 
type of the file.
 It's not practical to not have the possibility to simply say if we want or not 
hidden file. For example with a :
{code:java}
SELECT * FROM ....`MyDir/[.]?*`;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to