Matthias Boehm created SYSTEMML-951:
---------------------------------------
Summary: Efficient spark right indexing via lookup
Key: SYSTEMML-951
URL: https://issues.apache.org/jira/browse/SYSTEMML-951
Project: SystemML
Issue Type: Task
Components: Runtime
Reporter: Matthias Boehm
So far all versions of spark right indexing instructions require a full scan
over the data set. In case of existing partitioning (which anyway happens for
any external format - binary block conversion) such a full scan is unnecessary
if we're only interested in a small subset of the data. This task adds an
efficient right indexing operation via 'rdd lookups' which access at most
<num_lookup> partitions given existing hash partitioning.
cc [[email protected]]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)