[ 
https://issues.apache.org/jira/browse/BEAM-14166?focusedWorklogId=769415&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769415
 ]

ASF GitHub Bot logged work on BEAM-14166:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/May/22 04:30
            Start Date: 12/May/22 04:30
    Worklog Time Spent: 10m 
      Work Description: mosche commented on code in PR #17172:
URL: https://github.com/apache/beam/pull/17172#discussion_r870936447


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FieldValueGetter.java:
##########
@@ -33,5 +33,9 @@
   @Nullable
   ValueT get(ObjectT object);
 
+  default @Nullable Object getRaw(ObjectT object) {

Review Comment:
   Thanks so much for having a look @TheNeuralBit 🙏 
   
   `getRaw()` was based on a 
[conversation](https://lists.apache.org/thread/gjq453fm32s76zlvjs4kb5g3rgxnh7gs)
 with @reuvenlax.
   
   > getValues() is maybe poorly named - might be better called getRawValues. 
What you're  looking for is  probably the getBaseValues() method.
   > getValues is mostly used in code that knows exactly what it's doing for  
optimization purposes. It goes along with the attachValues method, which is 
similarly tricky to use. It's there to enable  0-copy  code, but not 
necessarily intended for general consumption. 
   
   `RowWithGetters.getValues()`  returns the "raw" unmodified result of the 
getters:
   ```java
   public List<Object> getValues() {
     return getters.stream().map(g -> 
g.get(getterTarget)).collect(Collectors.toList());
   }
   ```
   
   As I am pushing down the transformation of the getter result into the getter 
itself, I needed a way to bypass that in order to maintain the current 
semantics of `getValues()`. Let me know if the name makes sense given that 
context. 





Issue Time Tracking
-------------------

    Worklog Id:     (was: 769415)
    Time Spent: 3h 10m  (was: 3h)

> Improvements to RowWithGetter
> -----------------------------
>
>                 Key: BEAM-14166
>                 URL: https://issues.apache.org/jira/browse/BEAM-14166
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Moritz Mack
>            Assignee: Moritz Mack
>            Priority: P2
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Various improvements to getValue(fieldIdx) in RowWithGetters mentioned 
> [here|[https://github.com/apache/beam/pull/16947#discussion_r833602836]:]
>  * Minimize memory overhead of cache using either a index lookup (array) or a 
> single hash map if number of fields exceeds the initial hashmap capacity
>  * The cache should be checked before calling a getter to avoid any 
> potentially unnecessary conversion in the getter itself.
>  * [Nested 
> rows|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/RowWithGetters.java#L111]
>  should be cached, otherwise the cache of such nested rows can't be leveraged.
>  * Handling of collections / maps / iterables can be significantly improved 
> by simply skipping the transform in all cases where {{getValue}} for members 
> is the [identity 
> transform|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/RowWithGetters.java#L142].
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to