I think I see what happened, seems like a very easy fix. The filter just got omitted somehow when IndexBulkloadPOperator needs to pass it to the metadata provider to get the correct runtime operators and constraints. IndexInsertDeletePOperator handles it correctly, so that seems like that must've been what caused the observed difference. Everything was fine at the logical level, things were just being lost when that was translated to physical operators.
-Ian On Mon, Dec 14, 2015 at 1:57 PM, Ian Maxon <[email protected]> wrote: > It's definitely true that we at least intend to filter out nulls. the > use of createFilterExpression() in > IntroduceSecondaryIndexInsertDeleteRule makes this fairly clear. > Unless I am misinterpreting the meaning of a tuple being fed into the > RTreeBulkloader (i.e. if I see a tuple there, it's meant to be > inserted), It doesn't seem to be applied properly somehow. > > On Wed, Dec 2, 2015 at 8:55 AM, Mike Carey <[email protected]> wrote: >> Thx! (Stupid Q, I know, but I'd forgotten what we decided there N years >> ago... :-)) >> >> >> >> On 12/2/15 7:41 AM, Sattam Alsubaiee wrote: >>> >>> Same for all other indexes. Nulls won't be sent to any index, they are >>> always filtered. >>> >>> Sattam >>> On Dec 2, 2015 6:31 PM, "Mike Carey" <[email protected]> wrote: >>> >>>> And for other types of indexes? (Seems like this would be behavior we'd >>>> want at the index level, one way or the other, vs. the detailed kind of >>>> index level?) >>>> >>>> On 12/2/15 5:15 AM, Sattam Alsubaiee wrote: >>>> >>>>> Nulls won't be sent to the R-tree. They will be filtered and if they are >>>>> not filtered, then it is a bug. >>>>> >>>>> Cheers, >>>>> Sattam >>>>> >>>>> On Wed, Dec 2, 2015 at 5:25 AM, Ian Maxon (JIRA) <[email protected]> >>>>> wrote: >>>>> >>>>> [ >>>>>> >>>>>> >>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >>>>>> ] >>>>>> >>>>>> Ian Maxon reopened ASTERIXDB-1201: >>>>>> ---------------------------------- >>>>>> >>>>>> I think this should still happen with more data of a similar vein (i.e. >>>>>> lots of nulls)? I see in the code actually where this should happen and >>>>>> it >>>>>> must just not be triggered in the modified bulkload for that particular >>>>>> data. The real issue is when we try to calculate the MBR of a null >>>>>> shape. >>>>>> adjust/calculateMBRImpl just doesn't handle that, it's expecting to see >>>>>> something with doubles in a corner of the shape. >>>>>> >>>>>> RTree built on the optional field refuses to load the NULL value when >>>>>> executing the bulk load >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------------------------------- >>>>>> >>>>>>> Key: ASTERIXDB-1201 >>>>>>> URL: >>>>>>> >>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1201 >>>>>> >>>>>>> Project: Apache AsterixDB >>>>>>> Issue Type: Bug >>>>>>> Components: Storage >>>>>>> Reporter: Jianfeng Jia >>>>>>> Assignee: Ian Maxon >>>>>>> >>>>>>> When I build a RTree index on an optional field, it will throw "Value >>>>>>> >>>>>> provider for type NULL is not implemented" exception when operates the >>>>>> bulk >>>>>> load. >>>>>> >>>>>>> Here is the reproducible script: >>>>>>> {code} >>>>>>> drop dataverse test if exists; >>>>>>> create dataverse test; >>>>>>> use dataverse test; >>>>>>> create type t_record as closed { >>>>>>> fa : int64, >>>>>>> fb: int64?, >>>>>>> fc : point? >>>>>>> } >>>>>>> create dataset ds_set (t_record) primary key fa; >>>>>>> create index bidx on ds_set(fb) type btree; >>>>>>> create index cidx on ds_set(fc) type rtree; >>>>>>> insert into dataset ds_set ( [{"fa":1}, {"fa":2, "fb":3}, {"fa":3, >>>>>>> >>>>>> "fc":point("4.0,5.0")}]); >>>>>> >>>>>>> load dataset ds_set >>>>>>> using localfs >>>>>>> (("path"="172.17.0.2:///data/twitter/test.adm"),("format"="adm")); >>>>>>> {code} >>>>>>> The "insert" and "load" statements are run separately. >>>>>>> The test.adm uses the same three records: >>>>>>> {code} >>>>>>> {"fa":1} >>>>>>> {"fa":2, "fb":3} >>>>>>> {"fa":3, "fc":point("4.0,5.0") >>>>>>> {code} >>>>>>> The insert statement works fine. The error happens in the "load" >>>>>>> >>>>>> statement only: >>>>>> >>>>>>> {code} >>>>>>> Caused by: >>>>>>> >>>>>> >>>>>> org.apache.hyracks.algebricks.common.exceptions.NotImplementedException: >>>>>> Value provider for type NULL is not implemented >>>>>> >>>>>>> at >>>>>>> >>>>>> >>>>>> org.apache.asterix.dataflow.data.nontagged.valueproviders.AqlPrimitiveValueProviderFactory$1.getValue(AqlPrimitiveValueProviderFactory.java:64) >>>>>> >>>>>>> at >>>>>>> >>>>>> org.apache.hyracks.storage.am >>>>>> .rtree.frames.RTreeNSMFrame.adjustMBRImpl(RTreeNSMFrame.java:132) >>>>>> >>>>>>> at >>>>>>> >>>>>> org.apache.hyracks.storage.am >>>>>> .rtree.frames.RTreeNSMFrame.adjustMBR(RTreeNSMFrame.java:153) >>>>>> >>>>>>> at >>>>>>> >>>>>> org.apache.hyracks.storage.am >>>>>> .rtree.impls.RTree$RTreeBulkLoader.propagateBulk(RTree.java:954) >>>>>> >>>>>>> at >>>>>>> >>>>>> org.apache.hyracks.storage.am >>>>>> .rtree.impls.RTree$RTreeBulkLoader.end(RTree.java:937) >>>>>> >>>>>>> at >>>>>>> >>>>>> org.apache.hyracks.storage.am >>>>>> .lsm.rtree.impls.LSMRTree$LSMRTreeBulkLoader.end(LSMRTree.java:584) >>>>>> >>>>>>> at >>>>>>> >>>>>> org.apache.hyracks.storage.am >>>>>> >>>>>> .common.dataflow.IndexBulkLoadOperatorNodePushable.close(IndexBulkLoadOperatorNodePushable.java:107) >>>>>> >>>>>>> ... 7 more >>>>>>> {code} >>>>>>> The BTree index works fine if I remove the RTree index. >>>>>>> >>>>>> >>>>>> -- >>>>>> This message was sent by Atlassian JIRA >>>>>> (v6.3.4#6332) >>>>>> >>>>>> >>
