[ 
https://issues.apache.org/jira/browse/BEAM-7268?focusedWorklogId=241028&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-241028
 ]

ASF GitHub Bot logged work on BEAM-7268:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/May/19 11:44
            Start Date: 13/May/19 11:44
    Worklog Time Spent: 10m 
      Work Description: kanterov commented on issue #8552: [BEAM-7268] make 
sorter extension Hadoop-free
URL: https://github.com/apache/beam/pull/8552#issuecomment-491788531
 
 
   Great to reduce dependencies. If dependency conflicts are the only concern, 
an alternative could be shading Hadoop dependencies. It's hard to say without 
having benchmarks if this code is going to perform better or not. I would 
suggest having it as an alternative implementation for sorting and keep the 
previous one.
   
   If I'm not mistaken, your work for `SortValues` is needed to express 
`GroupByKeyAndSortValues` as `GroupByKey+SortValues`. There is a thread on a 
mailing list on adding `GroupByKeyAndSortValues` as a primitive transform in 
Beam [gkb-sort-values]. I would rather go that way, runners such as Dataflow, 
Spark, and Flink would override it with efficient implementation because they 
have a concept of secondary sorting, that would be way more efficient than 
anything else.
   
   What do you think about rather investing in having 
`GroupByKeyAndSortValues`, because, in the end, that's what we need? I might be 
better to move this discussion into the mailing list.
   
   cc @kennknowles @reuvenlax 
   
   [gbk-sort-values]: 
https://lists.apache.org/thread.html/313934c8543ae84d541653050a1bc77b5550b4a8262afd51e5695365@%3Cdev.beam.apache.org%3E
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 241028)
    Time Spent: 20m  (was: 10m)

> Make external sorter Hadoop free
> --------------------------------
>
>                 Key: BEAM-7268
>                 URL: https://issues.apache.org/jira/browse/BEAM-7268
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-ideas
>    Affects Versions: 2.12.0
>            Reporter: Neville Li
>            Assignee: Neville Li
>            Priority: Minor
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now the Java sorter extension depends on Hadoop SequenceFile for 
> external sort. It'll be nice to re-implement it without the dependency to 
> avoid conflicts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to