[jira] [Updated] (DRILL-3867) Store relative paths in metadata file

Vitalii Diravka (JIRA) Fri, 11 Aug 2017 01:46:57 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vitalii Diravka updated DRILL-3867:
-----------------------------------
    Description: 
git.commit.id.abbrev=cf4f745
git.commit.time=29.09.2015 @ 23\:19\:52 UTC

The below sequence of steps reproduces the issue

1. Create the cache file
{code}
0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
dfs.`/drill/testdata/metadata_caching/lineitem`;
+-------+-------------------------------------------------------------------------------------+
|  ok   |                                       summary                         
              |
+-------+-------------------------------------------------------------------------------------+
| true  | Successfully updated metadata for table 
/drill/testdata/metadata_caching/lineitem.  |
+-------+-------------------------------------------------------------------------------------+
1 row selected (1.558 seconds)
{code}

2. Move the directory
{code}
hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
{code}

3. Now run a query on top of it
{code}
0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 1;
Error: SYSTEM ERROR: FileNotFoundException: Requested file 
maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.


[Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
(state=,code=0)
{code}
This is obvious given the fact that we are storing absolute file paths in the 
cache file.

*Summary description of the fix:*

In Drill 1.11 and later, Drill stores the paths to the Parquet files as 
relative paths instead of absolute paths. You can move partitioned Parquet 
directories from one location in the distributed files system to another 
without issuing the REFRESH TABLE METADATA command to rebuild the Parquet 
metadata files; the metadata remains valid in the new location.

Note

Reverting back to a previous version of Drill from 1.11 is not recommended 
because Drill will incorrectly interpret the Parquet metadata files created by 
Drill 1.11. Should this occur, remove the Parquet metadata files and run the 
refresh table metadata command to rebuild the files in the older format.


  was:
git.commit.id.abbrev=cf4f745
git.commit.time=29.09.2015 @ 23\:19\:52 UTC

The below sequence of steps reproduces the issue

1. Create the cache file
{code}
0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
dfs.`/drill/testdata/metadata_caching/lineitem`;
+-------+-------------------------------------------------------------------------------------+
|  ok   |                                       summary                         
              |
+-------+-------------------------------------------------------------------------------------+
| true  | Successfully updated metadata for table 
/drill/testdata/metadata_caching/lineitem.  |
+-------+-------------------------------------------------------------------------------------+
1 row selected (1.558 seconds)
{code}

2. Move the directory
{code}
hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
{code}

3. Now run a query on top of it
{code}
0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 1;
Error: SYSTEM ERROR: FileNotFoundException: Requested file 
maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.


[Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
(state=,code=0)
{code}

This is obvious given the fact that we are storing absolute file paths in the 
cache file


> Store relative paths in metadata file
> -------------------------------------
>
>                 Key: DRILL-3867
>                 URL: https://issues.apache.org/jira/browse/DRILL-3867
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Assignee: Vitalii Diravka
>              Labels: doc-impacting, ready-to-commit
>             Fix For: 1.11.0
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +-------+-------------------------------------------------------------------------------------+
> |  ok   |                                       summary                       
>                 |
> +-------+-------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +-------+-------------------------------------------------------------------------------------+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file.
> *Summary description of the fix:*
> In Drill 1.11 and later, Drill stores the paths to the Parquet files as 
> relative paths instead of absolute paths. You can move partitioned Parquet 
> directories from one location in the distributed files system to another 
> without issuing the REFRESH TABLE METADATA command to rebuild the Parquet 
> metadata files; the metadata remains valid in the new location.
> Note
> Reverting back to a previous version of Drill from 1.11 is not recommended 
> because Drill will incorrectly interpret the Parquet metadata files created 
> by Drill 1.11. Should this occur, remove the Parquet metadata files and run 
> the refresh table metadata command to rebuild the files in the older format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-3867) Store relative paths in metadata file

Reply via email to