[ https://issues.apache.org/jira/browse/SPARK-41395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce Robbins updated SPARK-41395: ---------------------------------- Description: The following returns the wrong answer: {noformat} set spark.sql.codegen.wholeStage=false; set spark.sql.codegen.factoryMode=NO_CODEGEN; select max(col1), max(col2) from values (cast(null as decimal(27,2)), cast(null as decimal(27,2))), (cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2))) as data(col1, col2); +---------+---------+ |max(col1)|max(col2)| +---------+---------+ |null |239.88 | +---------+---------+ {noformat} This is because {{InterpretedMutableProjection}} inappropriately uses {{InternalRow#setNullAt}} to set null for decimal types with precision > {{Decimal.MAX_LONG_DIGITS}}. The path to corruption goes like this: Unsafe buffer at start: {noformat} offset/len for offset/len for 1st decimal 2nd decimal offset: 0 8 16 (0x10) 24 (0x18) 32 (0x20) data: 0300000000000000 0000000018000000 0000000028000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 {noformat} When processing the first incoming row ([null, null]), {{InterpretedMutableProjection}} calls {{setNullAt}} for the decimal types. As a result, the pointers to the storage areas for the two decimals in the variable length region get zeroed out. Buffer after projecting first row (null, null): {noformat} offset/len for offset/len for 1st decimal 2nd decimal offset: 0 8 16 (0x10) 24 (0x18) 32 (0x20) data: 0300000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 {noformat} When it's time to project the second row into the buffer, UnsafeRow#setDecimal uses the zero offsets, which causes {{UnsafeRow#setDecimal}} to overwrite the null-tracking bit set with decimal data: {noformat} null-tracking bit area offset: 0 8 16 (0x10) 24 (0x18) 32 (0x20) data: 5db4000000000000 0000000000000000 0200000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 {noformat} The null-tracking bit set is overwritten with 239.88 (0x5db4) rather than 245.00 (0x5fb4) because setDecimal indirectly calls setNotNullAt(1), which turns off the null-tracking bit associated with the field at index 1. In addition, the decimal at field index 0 is now null because of the corruption of the null-tracking bit set. When a decimal type with precision > {{Decimal.MAX_LONG_DIGITS}} is null, {{InterpretedMutableProjection}} should write a null {{Decimal}} value rather than call {{setNullAt}} (see.) This bug could get exercised during codegen fallback. Take for example this case where I forced codegen to fail for the {{Greatest}} expression: {noformat} spark-sql> select max(col1), max(col2) from values (cast(null as decimal(27,2)), cast(null as decimal(27,2))), (cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2))) as data(col1, col2); 22/12/05 08:18:54 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 58, Column 1: ';' expected instead of 'if' org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 58, Column 1: ';' expected instead of 'if' at org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362) at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:149) at org.codehaus.janino.Parser.read(Parser.java:3787) ... 22/12/05 08:18:56 WARN MutableProjection: Expr codegen error and falling back to interpreter mode java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 43, Column 1: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 43, Column 1: ';' expected instead of 'boolean' at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1583) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1580) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) ... 36 more ... NULL 239.88 <== incorrect result, should be (77.77, 245.00) Time taken: 6.132 seconds, Fetched 1 row(s) spark-sql> {noformat} was: The following returns the wrong answer: {noformat} set spark.sql.codegen.wholeStage=false; set spark.sql.codegen.factoryMode=NO_CODEGEN; select max(col1), max(col2) from values (cast(null as decimal(27,2)), cast(null as decimal(27,2))), (cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2))) as data(col1, col2); +---------+---------+ |max(col1)|max(col2)| +---------+---------+ |null |239.88 | +---------+---------+ {noformat} This is because {{InterpretedMutableProjection}} inappropriately uses {{InternalRow#setNullAt}} to set null for decimal types with precision > {{Decimal.MAX_LONG_DIGITS}}. The path to corruption goes like this: Unsafe buffer at start: {noformat} offset/len for offset/len for 1st decimal 2nd decimal offset: 0 8 16 (0x10) 24 (0x18) 32 (0x20) data: 0300000000000000 0000000018000000 0000000028000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 {noformat} When processing the first incoming row ([null, null]), {{InterpretedMutableProjection}} calls {{setNullAt}} for the decimal types. As a result, the pointers to the storage areas for the two decimals in the variable length region get zeroed out. Buffer after projecting first row (null, null): {noformat} offset/len for offset/len for 1st decimal 2nd decimal offset: 0 8 16 (0x10) 24 (0x18) 32 (0x20) data: 0300000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 {noformat} When it's time to project the second row into the buffer, UnsafeRow#setDecimal uses the zero offsets, which causes {{UnsafeRow#setDecimal}} to overwrite the null-tracking bit set with decimal data: {noformat} null-tracking bit area offset: 0 8 16 (0x10) 24 (0x18) 32 (0x20) data: 5db4000000000000 0000000000000000 0200000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 {noformat} The null-tracking bit set is overwritten with 239.88 (0x5db4) rather than 245.00 (0x5fb4) because setDecimal indirectly calls setNotNullAt(1), which turns off the null-tracking bit associated with the field at index 1. In addition, the decimal at field index 0 is now null because of the corruption of the null-tracking bit set. When a decimal type with precision > {{Decimal.MAX_LONG_DIGITS}} is null, {{InterpretedMutableProjection}} should write a null {{Decimal}} value rather than call {{setNullAt}} (see.) This bug could get exercised during codegen fallback. Take for example this case where I forcibly made codegen fail for the {{Greatest}} expression: {noformat} spark-sql> select max(col1), max(col2) from values (cast(null as decimal(27,2)), cast(null as decimal(27,2))), (cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2))) as data(col1, col2); 22/12/05 08:18:54 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 58, Column 1: ';' expected instead of 'if' org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 58, Column 1: ';' expected instead of 'if' at org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362) at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:149) at org.codehaus.janino.Parser.read(Parser.java:3787) ... 22/12/05 08:18:56 WARN MutableProjection: Expr codegen error and falling back to interpreter mode java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 43, Column 1: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 43, Column 1: ';' expected instead of 'boolean' at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1583) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1580) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) ... 36 more ... NULL 239.88 <== incorrect result, should be (77.77, 245.00) Time taken: 6.132 seconds, Fetched 1 row(s) spark-sql> {noformat} > InterpretedMutableProjection can corrupt unsafe buffer when used with decimal > data > ---------------------------------------------------------------------------------- > > Key: SPARK-41395 > URL: https://issues.apache.org/jira/browse/SPARK-41395 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.1, 3.2.3, 3.4.0 > Reporter: Bruce Robbins > Priority: Major > > The following returns the wrong answer: > {noformat} > set spark.sql.codegen.wholeStage=false; > set spark.sql.codegen.factoryMode=NO_CODEGEN; > select max(col1), max(col2) from values > (cast(null as decimal(27,2)), cast(null as decimal(27,2))), > (cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2))) > as data(col1, col2); > +---------+---------+ > |max(col1)|max(col2)| > +---------+---------+ > |null |239.88 | > +---------+---------+ > {noformat} > This is because {{InterpretedMutableProjection}} inappropriately uses > {{InternalRow#setNullAt}} to set null for decimal types with precision > > {{Decimal.MAX_LONG_DIGITS}}. > The path to corruption goes like this: > Unsafe buffer at start: > {noformat} > offset/len for offset/len for > 1st decimal 2nd decimal > offset: 0 8 16 (0x10) 24 (0x18) > 32 (0x20) > data: 0300000000000000 0000000018000000 0000000028000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 > {noformat} > When processing the first incoming row ([null, null]), > {{InterpretedMutableProjection}} calls {{setNullAt}} for the decimal types. > As a result, the pointers to the storage areas for the two decimals in the > variable length region get zeroed out. > Buffer after projecting first row (null, null): > {noformat} > offset/len for offset/len for > 1st decimal 2nd decimal > offset: 0 8 16 (0x10) 24 (0x18) > 32 (0x20) > data: 0300000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 > {noformat} > When it's time to project the second row into the buffer, > UnsafeRow#setDecimal uses the zero offsets, which causes > {{UnsafeRow#setDecimal}} to overwrite the null-tracking bit set with decimal > data: > {noformat} > null-tracking > bit area > offset: 0 8 16 (0x10) 24 (0x18) > 32 (0x20) > data: 5db4000000000000 0000000000000000 0200000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 > {noformat} > The null-tracking bit set is overwritten with 239.88 (0x5db4) rather than > 245.00 (0x5fb4) because setDecimal indirectly calls setNotNullAt(1), which > turns off the null-tracking bit associated with the field at index 1. > In addition, the decimal at field index 0 is now null because of the > corruption of the null-tracking bit set. > When a decimal type with precision > {{Decimal.MAX_LONG_DIGITS}} is null, > {{InterpretedMutableProjection}} should write a null {{Decimal}} value rather > than call {{setNullAt}} (see.) > This bug could get exercised during codegen fallback. Take for example this > case where I forced codegen to fail for the {{Greatest}} expression: > {noformat} > spark-sql> select max(col1), max(col2) from values > (cast(null as decimal(27,2)), cast(null as decimal(27,2))), > (cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2))) > as data(col1, col2); > 22/12/05 08:18:54 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 58, Column 1: ';' expected instead of 'if' > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 58, Column 1: ';' expected instead of 'if' > at > org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362) > at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:149) > at org.codehaus.janino.Parser.read(Parser.java:3787) > ... > 22/12/05 08:18:56 WARN MutableProjection: Expr codegen error and falling back > to interpreter mode > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 43, Column 1: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 43, Column 1: ';' expected instead of 'boolean' > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1583) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1580) > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 36 more > ... > NULL 239.88 <== incorrect result, should be (77.77, 245.00) > Time taken: 6.132 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org