[ https://issues.apache.org/jira/browse/DATAFU-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthew Hayes closed DATAFU-127. -------------------------------- Resolution: Fixed > New macro - samply by keys > -------------------------- > > Key: DATAFU-127 > URL: https://issues.apache.org/jira/browse/DATAFU-127 > Project: DataFu > Issue Type: New Feature > Reporter: Eyal Allweil > Assignee: Eyal Allweil > Priority: Major > Labels: macro > Fix For: 1.4.1 > > Attachments: DATAFU-127.patch > > > Two macros that return a sample of a larger table based on a list of keys, > with the schema of the larger table. One of the macros filters by dates, the > other doesn't. > If there are multiple rows with a key that appears in the key list, all of > them will be returned (no deduplication is done). The results are returned > ordered by the key field in a single file. > The implementation uses a replicated join for efficiency, but this means the > key list shouldn't be too large as to not fit in memory. > The first macro's definition looks as follows: > DEFINE sample_by_keys(table, sample_set, join_key_table, join_key_sample) > returns out { > - table_name - table name to sample > - sample_set - a set of keys > - join_key_table - join column name in the table > - join_key_sample - join column name in the sample -- This message was sent by Atlassian JIRA (v7.6.3#76005)