[ 
https://issues.apache.org/jira/browse/SOLR-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303038#comment-14303038
 ] 

Noble Paul commented on SOLR-7061:
----------------------------------

bq.We want to cache as minimum data as possible for each function. ContextImpl 
uses a HashMap<String, Object> to store document-level data in DocWrapper for 
all entities, no matter if they will be used by any function or not, which may 
cache unnecessary values in memory.

I fail to understand this fully. Th context only stores the values you 
explicitly set. So where are the unnecessary values coming from?

> Cross-Entity Variable Resolving and Arguments for ScriptTransformer Functions
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-7061
>                 URL: https://issues.apache.org/jira/browse/SOLR-7061
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.10.3
>            Reporter: Mark Peng
>            Priority: Minor
>              Labels: dataimport, transformers
>         Attachments: SOLR-7061.patch
>
>
> Script Transformer has been widely used to modify the value of columns of 
> selected rows from targeting data source (such as SQL Database) based on 
> specific logics, before writing to Solr as documents. However, current 
> implementation has the following limitations:
> *1. It is not possible to pass constant values or resolved variables (e.g., 
> $\{TABLE.COLUMN\} ) as arguments to a script function.*
> *2. Cross-entity row data exchange is not possible as well.*
> In our use case, we have complex nested entities and rely heavily on the 
> script functions to transform table rows while doing data import. Sometimes 
> for each single document, we need to get the selected column values from a 
> parent entity into current entity for doing value transformation and applying 
> if-else logics. To achieve this, we need to join with others tables in the 
> SQL of current entity, which is quite resource-consuming, especially for 
> large tables.
> Therefore, we have done some improvements to allow us to pass selected column 
> values from entity A to another entity B as its function arguments by 
> utilizing variable resolver.
> Here is an example about how it works. Suppose we have the following 
> configuration:
> {code}
> <dataConfig>
>     <dataSource name="ProductDB" 
>                 driver="oracle.jdbc.driver.OracleDriver" 
>                 url="jdbc:oracle:thin:@${dataimporter.request.host}:
>                        
> ${dataimporter.request.port}/${dataimporter.request.name}" 
>                 user="${dataimporter.request.user}" 
>                 password="${dataimporter.request.password}" 
>                 autoCommit="true"/>
>     <!-- ScriptTransformer functions -->
>     <script><![CDATA[
>         function processItemRow(row, resolvedVars) {
>             var isOnSale = resolvedVars.get("${PRODUCT.IS_ONSALE}");
>             var discount = resolvedVars.get("${PRODUCT.DISCOUNT_RATE}");
>             var price = row.get("PRICE");
>             
>             if(isOnSale) {
>               row.put("PRICE", price * discount);
>             }
>             else
>               row.put("PRICE", price);
>             
>             return row;
>         }
>         ]]>
>     </script>
>     <document name="EC_SHOP">
>         <entity dataSource="ProductDB" name="PRODUCT" 
>                 query="SELECT PRODUCT_ID, TITLE, IS_ONSALE, DISCOUNT_RATE 
> FROM PRODUCT">
>             <field column="PRODUCT_ID" name="PRODUCT_ID"/>
>             <field column="TITLE" name="TITLE"/>
>             <field column="IS_ONSALE" name="IS_ONSALE"/>
>             <field column="DISCOUNT_RATE" name="DISCOUNT_RATE"/>              
>  
>             
>             <entity dataSource="ProductDB" name="ITEM" 
>                     
> transformer="script:processItemRow(${PRODUCT.IS_ONSALE},${PRODUCT.DISCOUNT_RATE})"
>                     query="SELECT PRICE FROM ITEM WHERE PRODUCT_ID = 
> '${PRODUCT.PRODUCT_ID}'">
>                 <field column="PRICE" name="PRICE"/>
>             </entity>
>         </entity>
>     </document>
> </dataConfig>
> {code}
> As demonstrated above, now we can get access to the value of column 
> *IS_ONSALE* and *DISCOUNT_RATE* of table *PRODUCT* from the entity of table 
> *ITEM* by passing *$\{PRODUCT.IS_ONSALE\}* and *$\{PRODUCT.DISCOUNT_RATE\}* 
> as arguments of the function *processItemRow* to determine if we should give 
> some discounts for the production price. The signature of function has a 
> secondary argument (named *resolvedVars* here) for passing the map of column 
> values resolved from other previous entities.
> This improvement gives more flexibility for script functions to exchange row 
> data cross entities (even cross datasource) and do more complex processing 
> for entity rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to