[
https://issues.apache.org/jira/browse/TAJO-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182678#comment-14182678
]
Hyoungjun Kim commented on TAJO-1123:
-------------------------------------
1. I think that the current Fragment interface is sufficient to perform the
following function.
Fragment is used in the following function.
- Splitting a subquery into tasks.
-> It uses the abstracted getSplit() method of StorageManager.
- Partitioning by the fragment length (join or group by)
Finding best aggregation or join plan (hash or sort)
Finding best join order
-> It uses the abstracted getLength() method of Fragment.
- Assignment to the host.
-> It uses the abstracted getHosts() method of Fragment.
- Scanner
-> Because each scanner runs for the specified storage, Scanner already knows
what concrete fragment class is.
Some storage may not have the location or length information. For this case
Tajo should run with default value but currently not implemented.
2. Tajo already has a extendable Fragment. FragmentProto has the ‘contents’
field which preserves the serialized fragment value.
FragmentConvertor makes the concrete fragment instance using that field.
I am going to upload the patch which contains HBaseStorageManager,
HBaseFragment and HBaseFragmentProto.
> Use Fragment instead of FileFragment.
> -------------------------------------
>
> Key: TAJO-1123
> URL: https://issues.apache.org/jira/browse/TAJO-1123
> Project: Tajo
> Issue Type: Sub-task
> Reporter: Hyoungjun Kim
> Assignee: Hyoungjun Kim
> Priority: Minor
>
> Currently most operator and planner uses FileFragment object for splitting
> data. FileFragment only has a information about a scanning target file. In
> order to support various storage this should be changed to the abstract
> object 'Fragment'.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)