The TRAFODION-2001 changes have reached the point where I believe it they are
ready to merge to the Trafodion baseline as part of the Trafodion R2.2
objectives. So consider this a petition to the Trafodion developer community
for approval to merge into the mainline code base.
To be clear, this is a substantial change to the Trafodion Foundation
components and has a direct effect to the Trafodion configuration,
installation, and operational capabilities. These changes establish a
fundamental change going forward in that it will allow Trafodion to be more in
line with the elasticity capabilities of the Hadoop stack.
I have attached the original email I sent a year ago along with a set of slides
with a summary these changes.
In addition, here is the pull request which contains the changes and Jenkins
test results:
https://github.com/apache/incubator-trafodion/pull/1077
These changes do not complete the elasticity story, but implement the necessary
infrastructure to complete the Trafodion elasticity story. So there is more to
come such as support in the Python installation scripts, Cloudera Manager
parcel installation, Ambary installation, the SQL engine, and other Trafodion
component to take advantage of this capability.
If you have any questions please add them as comments in the JIRA
(https://issues.apache.org/jira/browse/TRAFODION-2001) so that they can be
captured and addressed. Also, you will find an updated document with more
details in the JIRA.
Regards,
Zalo
From: Zalo Correa
Sent: Thursday, May 4, 2017 11:15 AM
To: '[email protected]' <[email protected]>
Subject: TRAFODION-2001
I would like to draw your attention to a fairly substantial change I would like
to commit and merge to the Apache Incubator Trafodion code base. The changes
are described in https://issues.apache.org/jira/browse/TRAFODION-2001 and the
code changes are currently in
https://github.com/apache/incubator-trafodion/pull/1077.
A little background, the design and most of the implementation was done in the
spring of 2015 and donated to the Apache Foundation at the end of September
2015. I have worked on in for the past year among my other tasks and have
reached a point that I think it is ready for a merge to the mainline code base
after more thorough testing.
The current status is that Jenkins tests pass with one exception:
core-regress-core-hdp<https://jenkins.esgyn.com/job/core-regress-core-hdp/2204/>
fails at test core TEST116
However, this test appears fail in my test environment even without the
TRAFODION-2001 changes.
Real cluster testing is in process: functional, HA, and performance.
As I mentioned above, the changes are substantial and your active code review
participation would be most helpful in getting this needed functionality merged
to the Apache Incubator Trafodion code base.
Please use the JIRA and/or the pull request as the communication vehicle for
this activity.
Thank you in advance,
Zalo
Gonzalo Correa
Esgyn Corporation
[email protected]<mailto:[email protected]>
--- Begin Message ---
I would like to draw your attention to JIRA
TRAFODION-2001<https://issues.apache.org/jira/browse/TRAFODION-2001> which
specifies changes in configuration and operational components to support
elasticity in Trafodion. My intent is to generate discussion, obtain feedback,
correct mistakes, add missing items, and obtain consensus for when to integrate
these changes into the mainline code. Inherent with this capability is the
likelihood that other aspects of managing a Trafodion instance will require
changes and possibly enhancements. At a minimum, these enhancements change the
way current key process components are configured and managed, and the old way
goes away (this means that you will want to know the details of this JIRA if
you are an active contributor to Trafodion).
I am adding the contents of this email as an initial comment in the
TRAFODION-2001 JIRA and request that all feedback be done as comments in the
JIRA. I thank you in advance.
A little background, most of the implementation was done in the spring of 2015
and donated to the Apache Foundation at the end of September 2015. I am in the
process of merging these changes to the current Trafodion baseline in my
private fork.
Here is where I need your active participation and to help with that here is a
brief summary:
First, review the document attached to
TRAFODION-2001<https://issues.apache.org/jira/browse/TRAFODION-2001> JIRA, as
you will need its context for what follows here.
Current state:
Trafodion Foundation components:
'monitor/shell':
********* 'persist config/exec/info' commands are implemented
o A 'persist kill' command is not currently specified, which I believe to be
an unintended omission and needs to be added (it is an incomplete story without
it as stopping persistent processes whose number grows and contracts based on
node membership cannot be done with one simple command).
o Some important items to consider with a 'persist kill' command:
** Will return an error when used with DTM persistent processes (the
transaction manager process should not be stopped in haphazard way)
********* Are there other persistent processes that should also be protected in
this manner?
** Should it return an error with TSID persistent processes?
o The implementation of the 'persist kill' command corrects a problem with
the code generated in the 'sscpstop', and 'ssmpstop'.
** The current code generated does not take into account new processes created
when nodes are added.
********* 'node config' command is implemented
********* 'node add/delete' commands - TODO - in process
'scripts' changes implemented
********* Compilation of Trafodion configuration file, 'sqconfig', with new
'persist' section is implemented ('sqgen', Et. Al. scripts)
o The generation of 'gomon.cold' is greatly simplified as are the
'<xxx>start' scripts
********* Creation and display of configuration data base is implemented
Location of merged changes:
git remote add zcorrea_fork
[email protected]:zcorrea/incubator-trafodion<mailto:[email protected]:zcorrea/incubator-trafodion>
Branch: zcorrea_fork/TRAFODION-2001
Impact to other components:
Hadoop/Trafodion Installation
********* The ability to add and remove servers in an existing cluster implies
the provisioning and removal of operational resources of those servers.
o Trafodion depends on Hadoop and there is an implied order of provisioning
and operational readiness when adding servers to a cluster.
o This order will be the reverse when removing servers from a cluster.
Trafodion components
********* Existing functionality in Trafodion assumes that when an instance is
started, its static configuration does not change. Nodes may go down, i.e.,
fail, but the number of configured node remains static. This will no longer be
true as node membership will expand and contract in the life time of a instance
after initial instance startup.
I look forward to your feedback,
Zalo
Gonzalo Correa
--- End Message ---