Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/1014
  
    @arina-ielchiieva 
    
    - The parts addressing DRILL-4640 and DRILL-5166 LGTM
    - I think the fix for DRILL-5771 LGTM but I would like write down what I 
think is happening and confirm with you that my understanding is correct. This 
is mostly just a learning exercise for me since I am not very familiar with 
this part of the code :).
    
    In DRILL-5771 there were two issues.
    
    ## Race Conditions With Format Plugins
    
    ### Issue
    
    The following used to happen before the fix:
    
      1. When using an existing format plugin, the **FormatPlugin** would 
create a **DrillTable** with a **NamedFormatPluginConfig** which only contains 
the name of the format plugin to use.
      1. The **ScanOperator** created for a **DrillTable** will contain the 
**NamedFormatPluginConfig**
      1. When the **ScanOperators** are serialized in to the physical plan the 
serialized **ScanOperator** will only contain the name of the format plugin to 
use.
      1. When a worker deserializes the physical plan to do a scan, he gets the 
name of the **FormatPluginConfig** to use.
      1. The worker then looks up the correct **FormatPlugin** in the 
**FormatCreator** using the name he has.
      1. The worker can get into trouble if the **FormatPlugins** he has cached 
in his **FormatCreator** is out of sync with the rest of the cluster.
    
    ### Fix
    
    Race conditions are eliminated because the **DrillTables** returned by the 
**FormatPlugins** no longer contain a **NamedFormatPluginConfig**, they contain 
the full **FormatPluginConfig** not just a name alias. So when a query is 
executed:
      1. The ScanOperator contains the complete **FormatPluginConfig**
      1. When the physical plan is serialized it contains the complete 
**FormatPluginConfig** for each scan operator.
      1. When a worker node deserializes the ScanOperator it also has the 
complete **FormatPluginConfig** so it can reconstruct the **FormatPlugin** 
correctly, whereas previously the worker would have to do a lookup using the 
**FormatPlugin** name in the **FormatCreator** when the cache in the 
**FormatCreator** may be out of sync with the rest of the cluster. 
    
    ## FormatPluginConfig Equals and HashCode
    
    ### Issue
    
    The **FileSystemPlugin** looks up **FormatPlugins** corresponding to a 
**FormatPluginConfig** in formatPluginsByConfig. However, the 
**FormatPluginConfig** implementations didn't override equals and hashCode.
    



---

Reply via email to