[ 
https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5771:
------------------------------------
    Description: 
Create unit tests to check that all storage format plugins can be successfully 
serialized  / deserialized.
Usually this happens when query has several major fragments. 

One way to check serde is to generate physical plan (generated as json) and 
then submit it back to Drill.

One example of found errors is described in the first comment. Another example 
is described in DRILL-5166.

*Serde issues:*

1. Could not obtain format plugin during deserialization
Format plugin is created based on format plugin configuration or its name. 
On Drill start up we load information about available plugins (its reloaded 
each time storage plugin is updated, can be done only by admin).
When query is parsed, we try to get plugin from the available ones, it we can 
not find one we try to [create 
one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
but on other query execution stages we always assume that [plugin exists based 
on 
configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].

For example, during query parsing we had to create format plugin on one node 
based on format configuration.
Then we have sent major fragment to the different node where we used this 
format configuration we could not get format plugin based on it and 
deserialization has failed.
To fix this problem we need to create format plugin during query 
deserialization if it's absent.
  
2.  Absent hash code and equals.
Format plugins are stored in hash map where key is format plugin config.
Since some format plugin configs did not have overridden hash code and equals, 
we could not find format plugin based on its configuration.

3. Named format plugin usage
Named format plugins configs allow to get format plugin by its name for 
configuration shared among all drillbits.
They are used as alias for pre-configured format plugiins. User with admin 
priliges can modify them at runtime.
Named format plugins configs are used instead of sending all non-default 
parameters of format plugin config, in this case only name is sent.
Their usage in distributed system may cause raise conditions.
For example, 
1. Query is submitted. 
2. Parquet format plugin is created with the following configuration 
(autoCorrectCorruptDates=>true).
3. Seralized named format plugin config with name as parquet.
4. Major fragment is sent to the different node.
5. Admin has changed parquet configuration for the alias 'parquet' on all nodes 
to autoCorrectCorruptDates=>false.
6. Named format is deserialized on the different node into parquet format 
plugin with configuration (autoCorrectCorruptDates=>false).


  was:
Create unit tests to check that all storage format plugins can be successfully 
serialized  / deserialized.
Usually this happens when query has several major fragments. 

One way to check serde is to generate physical plan (generated as json) and 
then submit it back to Drill.

One example of found errors is described in the first comment. Another example 
is described in DRILL-5166.


> Fix serDe errors for format plugins
> -----------------------------------
>
>                 Key: DRILL-5771
>                 URL: https://issues.apache.org/jira/browse/DRILL-5771
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.11.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Minor
>
> Create unit tests to check that all storage format plugins can be 
> successfully serialized  / deserialized.
> Usually this happens when query has several major fragments. 
> One way to check serde is to generate physical plan (generated as json) and 
> then submit it back to Drill.
> One example of found errors is described in the first comment. Another 
> example is described in DRILL-5166.
> *Serde issues:*
> 1. Could not obtain format plugin during deserialization
> Format plugin is created based on format plugin configuration or its name. 
> On Drill start up we load information about available plugins (its reloaded 
> each time storage plugin is updated, can be done only by admin).
> When query is parsed, we try to get plugin from the available ones, it we can 
> not find one we try to [create 
> one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
> but on other query execution stages we always assume that [plugin exists 
> based on 
> configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].
> For example, during query parsing we had to create format plugin on one node 
> based on format configuration.
> Then we have sent major fragment to the different node where we used this 
> format configuration we could not get format plugin based on it and 
> deserialization has failed.
> To fix this problem we need to create format plugin during query 
> deserialization if it's absent.
>   
> 2.  Absent hash code and equals.
> Format plugins are stored in hash map where key is format plugin config.
> Since some format plugin configs did not have overridden hash code and 
> equals, we could not find format plugin based on its configuration.
> 3. Named format plugin usage
> Named format plugins configs allow to get format plugin by its name for 
> configuration shared among all drillbits.
> They are used as alias for pre-configured format plugiins. User with admin 
> priliges can modify them at runtime.
> Named format plugins configs are used instead of sending all non-default 
> parameters of format plugin config, in this case only name is sent.
> Their usage in distributed system may cause raise conditions.
> For example, 
> 1. Query is submitted. 
> 2. Parquet format plugin is created with the following configuration 
> (autoCorrectCorruptDates=>true).
> 3. Seralized named format plugin config with name as parquet.
> 4. Major fragment is sent to the different node.
> 5. Admin has changed parquet configuration for the alias 'parquet' on all 
> nodes to autoCorrectCorruptDates=>false.
> 6. Named format is deserialized on the different node into parquet format 
> plugin with configuration (autoCorrectCorruptDates=>false).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to