[ 
https://issues.apache.org/jira/browse/SOLR-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Peng updated SOLR-7061:
----------------------------
    Description: 
Script Transformer has been widely used to modify the value of columns of 
selected rows from targeting data source (such as SQL Database) based on 
specific logics, before writing to Solr as documents. However, current 
implementation has the following limitations:
*1. It is not possible to pass constant values or resolved variables (e.g., 
$\{TABLE.COLUMN\} ) as arguments to a script function.*
*2. Cross-entity row data exchange is not possible as well.*

In our use case, we have complex nested entities and rely heavily on the script 
functions to transform table rows while doing data import. Sometimes for each 
single document, we need to get the selected column values from a parent entity 
into current entity for doing value transformation and applying if-else logics. 
To achieve this, we need to join with others tables in the SQL of current 
entity, which is quite resource-consuming, especially for large tables.

Therefore, we have done some improvements to allow us to pass selected column 
values from entity A to another entity B as its function arguments by utilizing 
variable resolver.

Here is an example about how it works. Suppose we have the following 
configuration:

{code}
<dataConfig>
    <dataSource name="ProductDB" 
                driver="oracle.jdbc.driver.OracleDriver" 
                
url="jdbc:oracle:thin:@${dataimporter.request.host}:${dataimporter.request.port}/${dataimporter.request.name}"
 
                user="${dataimporter.request.user}" 
                password="${dataimporter.request.password}" 
                autoCommit="true"/>
    <!-- ScriptTransformer functions -->
    <script><![CDATA[
        function processItemRow(row, resolvedVars) {
            var isOnSale = resolvedVars.get("${PRODUCT.IS_ONSALE}");
            var discount = row.get("DISCOUNT_RATE");
            var price = row.get("PRICE");
            
            if(isOnSale) {
              row.put("PRICE", price * discount);
            }
            else
              row.put("PRICE", price);
            
            return row;
        }
        ]]>
    </script>
    <document name="EC_SHOP">
        <entity dataSource="ProductDB" name="PRODUCT" 
                query="SELECT PRODUCT_ID, TITLE, IS_ONSALE FROM PRODUCT">
            <field column="PRODUCT_ID" name="PRODUCT_ID"/>
            <field column="TITLE" name="TITLE"/>
            <field column="IS_ONSALE" name="IS_ONSALE"/>   
            
            <entity dataSource="ProductDB" name="ITEM" 
                    transformer="script:processItemRow(${PRODUCT.IS_ONSALE})"
                    query="SELECT PRICE FROM ITEM WHERE PRODUCT_ID = 
'${PRODUCT.PRODUCT_ID}'">
                <field column="PRICE" name="PRICE"/>
            </entity>
        </entity>
  </document >
</dataConfig>
{code}


As demonstrated above, now we can get access to the value of column *IS_ONSALE* 
of table *PRODUCT* from the entity of table *ITEM* by passing 
*$\{PRODUCT.IS_ONSALE\}* as an argument of the function *processItemRow* to 
determine if we should give some discounts for the production price. The 
signature of function has a secondary argument (named *resolvedVars* here) for 
passing the map of column values resolved from other previous entities.

This improvement gives more flexibility for script functions to exchange row 
data cross entities and do more complex processing for entity rows.


  was:
Script Transformer has been widely used to modify the value of columns of 
selected rows from targeting data source (such as SQL Database) based on 
specific logics, before writing to Solr as documents. However, current 
implementation has the following limitations:
*1. It is not possible to pass constant values or resolved variables (e.g., 
$\{TABLE.COLUMN\} ) as arguments to a script function.*
*2. Cross-entity row data exchange is not possible as well.*

In our use case, we have complex nested entities and rely heavily on the script 
functions to transform table rows while doing data import. Sometimes for each 
single document, we need to get the selected column values from a parent entity 
into current entity for doing value transformation and applying if-else logics. 
To achieve this, we need to join with others tables in the SQL of current 
entity, which is quite resource-consuming, especially for large tables.

Therefore, we have done some improvements to allow us to pass selected column 
values from entity A to another entity B as its function arguments by utilizing 
variable resolver.

Here is an example about how it works. Suppose we have the following 
configuration:

{code}
<dataConfig>
    <dataSource name="ProductDB" 
                driver="oracle.jdbc.driver.OracleDriver" 
                
url="jdbc:oracle:thin:@${dataimporter.request.host}:${dataimporter.request.port}/${dataimporter.request.name}"
 
                user="${dataimporter.request.user}" 
                password="${dataimporter.request.password}" 
                autoCommit="true"/>
    <!-- ScriptTransformer functions -->
    <script><![CDATA[
        function processItemRow(row, resolvedVars) {
            var isOnSale = resolvedVars.get("${PRODUCT.IS_ONSALE}");
            var discount = row.get("DISCOUNT_RATE");
            var price = row.get("PRICE");
            
            if(isOnSale) {
              row.put("PRICE", price * discount);
            }
            else
              row.put("PRICE", price);
            
            return row;
        }
        ]]>
    </script>
    <document name="EC_SHOP">
        <entity dataSource="ProductDB" name="PRODUCT" 
                query="SELECT PRODUCT_ID, TITLE, IS_ONSALE FROM PRODUCT">
            <field column="PRODUCT_ID" name="PRODUCT_ID"/>
            <field column="TITLE" name="TITLE"/>
            <field column="IS_ONSALE" name="IS_ONSALE"/>   
            
            <entity dataSource="ProductDB" name="ITEM" 
                    transformer="script:processItemRow(${PRODUCT.IS_ONSALE})"
                    query="SELECT PRICE FROM ITEM WHERE PRODUCT_ID = 
'${PRODUCT.PRODUCT_ID}'">
                <field column="PRICE" name="PRICE"/>
            </entity>
        </entity>
</dataConfig>
{code}


As demonstrated above, now we can get access to the value of column *IS_ONSALE* 
of table *PRODUCT* from the entity of table *ITEM* by passing 
*$\{PRODUCT.IS_ONSALE\}* as an argument of the function *processItemRow* to 
determine if we should give some discounts for the production price. The 
signature of function has a secondary argument (named *resolvedVars* here) for 
passing the map of column values resolved from other previous entities.

This improvement gives more flexibility for script functions to exchange row 
data cross entities and do more complex processing for entity rows.



> Cross-Entity Variable Resolving and Arguments for ScriptTransformer Functions
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-7061
>                 URL: https://issues.apache.org/jira/browse/SOLR-7061
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.10.3
>            Reporter: Mark Peng
>            Priority: Minor
>              Labels: dataimport, transformers
>         Attachments: SOLR-7061.patch
>
>
> Script Transformer has been widely used to modify the value of columns of 
> selected rows from targeting data source (such as SQL Database) based on 
> specific logics, before writing to Solr as documents. However, current 
> implementation has the following limitations:
> *1. It is not possible to pass constant values or resolved variables (e.g., 
> $\{TABLE.COLUMN\} ) as arguments to a script function.*
> *2. Cross-entity row data exchange is not possible as well.*
> In our use case, we have complex nested entities and rely heavily on the 
> script functions to transform table rows while doing data import. Sometimes 
> for each single document, we need to get the selected column values from a 
> parent entity into current entity for doing value transformation and applying 
> if-else logics. To achieve this, we need to join with others tables in the 
> SQL of current entity, which is quite resource-consuming, especially for 
> large tables.
> Therefore, we have done some improvements to allow us to pass selected column 
> values from entity A to another entity B as its function arguments by 
> utilizing variable resolver.
> Here is an example about how it works. Suppose we have the following 
> configuration:
> {code}
> <dataConfig>
>     <dataSource name="ProductDB" 
>                 driver="oracle.jdbc.driver.OracleDriver" 
>                 
> url="jdbc:oracle:thin:@${dataimporter.request.host}:${dataimporter.request.port}/${dataimporter.request.name}"
>  
>                 user="${dataimporter.request.user}" 
>                 password="${dataimporter.request.password}" 
>                 autoCommit="true"/>
>     <!-- ScriptTransformer functions -->
>     <script><![CDATA[
>         function processItemRow(row, resolvedVars) {
>             var isOnSale = resolvedVars.get("${PRODUCT.IS_ONSALE}");
>             var discount = row.get("DISCOUNT_RATE");
>             var price = row.get("PRICE");
>             
>             if(isOnSale) {
>               row.put("PRICE", price * discount);
>             }
>             else
>               row.put("PRICE", price);
>             
>             return row;
>         }
>         ]]>
>     </script>
>     <document name="EC_SHOP">
>         <entity dataSource="ProductDB" name="PRODUCT" 
>                 query="SELECT PRODUCT_ID, TITLE, IS_ONSALE FROM PRODUCT">
>             <field column="PRODUCT_ID" name="PRODUCT_ID"/>
>             <field column="TITLE" name="TITLE"/>
>             <field column="IS_ONSALE" name="IS_ONSALE"/>   
>             
>             <entity dataSource="ProductDB" name="ITEM" 
>                     transformer="script:processItemRow(${PRODUCT.IS_ONSALE})"
>                     query="SELECT PRICE FROM ITEM WHERE PRODUCT_ID = 
> '${PRODUCT.PRODUCT_ID}'">
>                 <field column="PRICE" name="PRICE"/>
>             </entity>
>         </entity>
>   </document >
> </dataConfig>
> {code}
> As demonstrated above, now we can get access to the value of column 
> *IS_ONSALE* of table *PRODUCT* from the entity of table *ITEM* by passing 
> *$\{PRODUCT.IS_ONSALE\}* as an argument of the function *processItemRow* to 
> determine if we should give some discounts for the production price. The 
> signature of function has a secondary argument (named *resolvedVars* here) 
> for passing the map of column values resolved from other previous entities.
> This improvement gives more flexibility for script functions to exchange row 
> data cross entities and do more complex processing for entity rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to