[
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177321#comment-16177321
]
Zhiyuan Yang commented on TEZ-3841:
-----------------------------------
Welcome [~SolalPirelli] to Tez community! Did you send an email to dev mailing
list? There you can get more reply. Not every one follows the jira update.
The idea looks interesting. But I suggest more elaboration on the requirement,
for example, the scope of performance issue (CPU wise, or event wise, or
network wise), because this really affect how you will implement it. Right now
you can simulate many things even without fake task launcher or task
communicator. Faking Input/Process/Output are already used in test. Scheduling
and event routing can be specified via custom vertex/edge manager.
Implementation wise, you might want to consider implement it as plugin. Current
task launcher, task communicator and task scheduler are all pluggable. How to
make it as close as to reality is another challenge depends on the scope of
issue you want to solve. I notice you use thread to run each task. This may not
work well if you want to try very large scale (like tens of thousands of
containers).
> Proposal: Simulator mode
> ------------------------
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Solal Pirelli
> Attachments: tez-fake-mode.patch
>
>
> Early work on a new feature proposal: a "simulator" mode in which vertices
> are not actually executed, but instead use a simplified "fake" processor
> (which is configurable, and by default does nothing) to let a developer see
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of
> its 1000 tasks send a bunch of events - does this scale? Or, what if a
> specific vertex fails 2% of the time - how does this impact overall graph
> execution? Are 2 nodes with 10 containers per node enough, or should one
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a
> new "fake" mode with a custom task scheduler and container launcher. It adds
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with
> a single method `run` that takes the vertex name, task index and task
> attempt, and returns a list of events. Throwing an exception causes the task
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which
> randomly kills tasks, pre-empts containers, etc., but would appreciate some
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache
> projects; if there is a more formal procedure for suggesting a new feature,
> please point me to it.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)