+1 for finishing the porting to Java ahead of anything else - it will be a 
significant milestone. I have a JIRA assigned concerning to the porting. I will 
work on it for the 2.0 release.

it’s a priority to guarantee no performance regressions. As part of this 
endeavor, explore an automated (or easy) way to run and assert major 
performance benchmarks. Ideally any contributor should be able to fairly easily 
test the impact of changes under certain performance test scenarios.

Beam Runner work should take into account the impact of incorporating new 
JStorm features and Storm Worker 
Redesign<https://issues.apache.org/jira/browse/STORM-2284>. Not very efficient 
to start doing it, to  find out that it will have to chance in face of Storm 
and worker redesign. That is, it should be done after it’s building blocks are 
stable.

Thanks,
Hugo

On Mar 24, 2017, at 12:07 AM, Arun Mahadevan 
<ar...@apache.org<mailto:ar...@apache.org>> wrote:

+1 to release with the porting completed. I think its mainly the UI server and 
log viewer that’s pending.

We can start doing the regression and performance tests for whatever is already 
ported.

If anyone is running the master branch in their pre-prod / prod environments, 
it will be good to know and give us more confidence.

The other features can be added in follow up releases.

Regards,
Arun


On 3/24/17, 11:47 AM, "Satish Duggana" 
<satish.dugg...@gmail.com<mailto:satish.dugg...@gmail.com>> wrote:

+1 to have 2.0 with porting and performance(it should be at least as good
as 1.x release) issues addressed

We can target other tasks(mentioned by Taylor and Jungtaek) for 2.x-branch.


Exactly-once support:
While thinking through the exactlyonce support design, it is realized
better to avoid acking tuples and implement exactly once by snapshotting
barriers. It seems JStorm folks followed similar design, they claim it
gives better performance. This feature is essential for beam runner and we
can decide on respective approaches though.

Beam Runner
Lets hold on this for now and keep it in Storm till 2.x. We should avoid
having a minimal beam runner in haste. It is better to address STORM-2284,
exactly-once and other windowing enhancements to enable beam runner.

JStorm
Agree with Jungtaek on looking at the latest JStorm and align/scope with
the features for 2.x.

STORM-2284
We may want to look at JStorm worker before working on respective
components in this epic to pull appropriate enhancements.

YARN/MESOS
Supporting Storm on YARN/Mesos for 2.x.

Thanks,
Satish.


On Fri, Mar 24, 2017 at 9:09 AM, Jungtaek Lim 
<kabh...@gmail.com<mailto:kabh...@gmail.com>> wrote:

First of all, +1 to complete only port work and do sanity check (including
performance regression), and release.

If we can get STORM-2284 within deterministic time frame (say 2~3 months)
that should be great, but if not I'd in favor of postponing that to later
2.x release.

JStorm released their new versions after code donation. So there're more
things we could get ideas from, or even adopt from.
https://github.com/alibaba/jstorm/blob/master/history.md
As you noticed from release note link, we also need to update phase 2 since
they already changed what we're planning to do in phase 2. For example,
they changed backpressure to end-to-end, and changed to use snapshot rather
than acker.
May be sure, JStorm pulled many features from today's Storm, like Flux,
Windowing, more shuffle groupings, log search, log level change, and so on.

STORM-2426 <https://issues.apache.org/jira/browse/STORM-2426> is due to
the
limitation of Spout lifecycle (all the things are done in single thread),
and STORM-1358 <https://issues.apache.org/jira/browse/STORM-1358>(JStorm's
multi-thread Spout) can remedy this (despite that Spout implementation may
need to guarantee thread-safety later). It's not a just improvement but
close to design concern so would like to address sooner than other things
in phase 2.

For Storm SQL side, I've lost progress but major work would be adopting
group by with windowing. It was not available from Calcite but will be
available at next release (1.12.0).
I've filed this to STORM-2405
<https://issues.apache.org/jira/browse/STORM-2405>, but windowing & micro
batch is not intuitive, so I would like to change the underlying API to
stream API in SQL. Also filed this to STORM-2406
<https://issues.apache.org/jira/browse/STORM-2406>.

Just 2 cents btw, hopefully I would like to see metrics V2 sooner since we
lost metrics even when doing normal operation like restarting worker,
rebalancing, and so on. Eventually we need to fight with dynamic scaling,
and then metrics will be broken often.

Thanks,
Jungtaek Lim (HeartSaVioR)

2017년 3월 24일 (금) 오전 5:05, Harsha Chintalapani <st...@harsha.io>님이 작성:

Storm 2.0 migration to java in itself is a big win and would attract
wider
community and adoption. So my vote would be to resolve the first 3 items
to
get a release out.
All the other featured mentioned are great to have but shouldn't be
blockers for 2.0 release.

-Harsha

On Thu, Mar 23, 2017 at 11:51 AM P. Taylor Goetz <ptgo...@gmail.com>
wrote:

With the 1.1.0 release nearing completion, I’d like to turn our
attention
to 2.0 and develop a plan for what features, etc. to include.

The following 3 are what I feel are the minimum for a 2.0 release.
These
could likely be resolved relatively quickly:

* Performance — I’ve not benchmarked the master branch vs. 1.0.x or
1.1.x
in a while, but I feel it will be important to make sure there are no
performance regressions, and would hope that we actually have a
performance
improvement over previous versions. To that end (e.g. if there is in
fact a
performance regression), the proposals that Roshan Naik put together
for
revising the threading and execution model (STORM-2307) and replacing
Disruptor with JCTools (STORM-2306) warrant review and consideration.
See
also STORM-2284 which is the parent JIRA.

* Finish porting Storm UI to java (STORM-1311)

* Finish porting log viewer to java (STORM-1280)

The following are items that are nice to have in 2.0, but I don’t feel
are
absolutely necessary for an initial 2.0 release:

* Beam Runner (I wouldn’t tie this to 2.0, mentioning it because it’s
relevant) — Initially there seemed to be a lot of interest in this, but
that seems to have trailed off. I spoke with some Beam developers and
there
seems to be interest from that community as well. Do we want to move
that
effort to the Beam community, or keep it here? Moving it to the Beam
community might lead to better collaboration between projects.

* Bounded Spouts (needed for Beam Runner implementation) — Currently
spouts are unbounded, there no end to the stream. Beam has the concept
of
bounded sources (roughly analogous to batch processing). To support
that,
we would need to implement a similar concept in Storm. One benefit of
such
a feature would be the ability to handle both bounded and unbounded
workflows in Storm.

* Storm-SQL — Jungtaek/Xin: You have been the primary drivers behind
this
effort. What improvements do you envision for 2.0?

* Metrics V2 (STORM-2153: Coda Hale Metrics) — I’ve been targeting this
for 1.2.0, but it’s designed to be easily portable to master/2.0.

* JStorm Migration — Original outline can be found here [1]. Note a lot
of
the associated JIRAs below are assigned, but there hasn’t been any
recent
activity or pull requests, we should probably consider them unassigned
and
up for grabs.:

* Worker Classloader Isolation (STORM-1338) — Lack of this has been the
bane of a lot of Storm users almost since day one. We have largely
addressed it by shading/relocating dependencies. It would be great to
see
this addressed once and for all.

* JStorm back pressure implementation (STORM-1324) — The current back
pressure implementation leaves a bit to be desired, and the JStorm
approach
looks promising, though it also depends on the JStorm concept of
“topology
master” (STORM-1323), which may have some implications regarding
security.

* Dynamic Topology Updates (STORM-1335) — This would provide a command
to
update topology jars and configuration without stopping the topology,
and
is well suited to leverage the blobstore. The restart command (that can
also update the topology configuration) also looks compelling
(STORM-1334).

* Additional Scheduler Implementations (STORM-1320)

* Additional Grouping Implementations (STORM-1328)


As always I’m open to any opinions and suggestions.

-Taylor

[1]

https://cwiki.apache.org/confluence/pages/viewpage.
action?pageId=61328109









Reply via email to