Re: Ignite diagnostic (SQL system views)

Vladimir Ozerov Wed, 14 Feb 2018 01:02:31 -0800

I would start with NODES and NODE_ATTRIBUTES as the most simple thing.

On Tue, Feb 13, 2018 at 4:10 AM, Denis Magda <dma...@apache.org> wrote:


> Alex P, sounds like a good plan for me.
>
> Vladimir, do you have any suggestions or corrections?
>
> —
> Denis
>
> > On Feb 12, 2018, at 4:57 AM, Alex Plehanov <plehanov.a...@gmail.com>
> wrote:
> >
> > The views engine and the first view are almost ready to merge (review
> > comments are resolved). Which views should we take next? My proposal -
> > NODES, NODE_ATTRIBUTES, NODE_METRICS, NODE_HOSTS and NODE_ADDRESSES,
> since
> > these views are clear and all topology data available on each node.
> > Any objections?
> >
> > 2018-01-25 16:27 GMT+03:00 Alex Plehanov <plehanov.a...@gmail.com>:
> >
> >> Anton, Vladimir, I've made some fixes. There is only one view left and
> >> it's renamed to 'IGNITE.LOCAL_TRANSACTIONS'.
> >>
> >> High level design of solution:
> >> When IgniteH2Indexing is starting, it create and start
> >> new GridH2SysViewProcessor, which create and register in H2 (via its own
> >> table engine) all implementations of system views. Each system view
> >> implementation extends base abstract class GridH2SysView. View
> >> implementation describes columns, their types and indexes in constructor
> >> and must override method getRows for data retrieval (this method called
> by
> >> H2-compatible table and index implementations for ignite system views).
> >> Almost no fixes to existing parsing engine was made, except some places,
> >> where GridH2Table instance was expected, but for system views there is
> >> another class.
> >>
> >> New PR: [1].  Please have a look.
> >>
> >> [1] https://github.com/apache/ignite/pull/3433
> >>
> >> 2018-01-24 19:12 GMT+03:00 Anton Vinogradov <avinogra...@gridgain.com>:
> >>
> >>> I've created IEP-13 [1] to cover all cases.
> >>> Feel free to create issues.
> >>>
> >>> [1]
> >>> https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=75962769
> >>>
> >>> On Wed, Jan 24, 2018 at 6:10 PM, Vladimir Ozerov <voze...@gridgain.com
> >
> >>> wrote:
> >>>
> >>>> Let's start with a single and the most simple view, e.g.
> >>>> LOCAL_TRANSACTIONS. We will review and merge it along with necessary
> >>>> infrastructure. Then will handle the rest view in separate tickets and
> >>>> separate focused discussions.
> >>>>
> >>>> On Wed, Jan 24, 2018 at 5:29 PM, Alex Plehanov <
> plehanov.a...@gmail.com
> >>>>
> >>>> wrote:
> >>>>
> >>>>> 1) It’s not a principal point, I can change schema. The
> >>>> INFORMATION_SCHEMA
> >>>>> was used because it’s already exists and usually used for metadata
> >>> tables
> >>>>> and views. Your proposal is to use schema “IGNITE”, am I understand
> >>> you
> >>>>> right? BTW, for now, we can’t query another (H2) meta tables from the
> >>>>> INFORMATION_SCHEMA, so, “Ignite system views” is only available views
> >>> to
> >>>>> query from this schema.
> >>>>> 2) Exactly for this reason the IGNITE_INSTANCE view is useful: to
> >>>> determine
> >>>>> which node we are connected to.
> >>>>> 3) As the first phase, in my opinion, local views will be enough.
> >>>>> Performance and caching of distributed views should be discussed at
> >>> next
> >>>>> phases, when distributed views implementation will be planned. In
> >>> current
> >>>>> implementation I tried to use indexing for local views wherever it’s
> >>>>> possible.
> >>>>> 4) I don’t think, that JVM info is more critical information than,
> for
> >>>>> example, caches or nodes information. When authorization capabilities
> >>>>> planned to implement?
> >>>>>
> >>>>> About local data: yes, we can rename all currently implemented views
> >>> for
> >>>>> the local node data as LOCAL_..., and create (someday) new whole
> >>> cluster
> >>>>> views (which use distributed requests) without prefix or, for
> example,
> >>>> with
> >>>>> CLUSTER_ prefix. But some views can show all cluster information
> using
> >>>> only
> >>>>> local node data, without distributed requests (for example
> >>>>> IGNITE_NODE_METRICS, IGNITE_PART_ASSIGNMENT, IGNITE_PART_ALLOCATION,
> >>>>> IGNITE_NODES, etc). Are they local or cluster views in this concept?
> >>>> Which
> >>>>> prefix should be used? And what about caches? Are they local or
> >>> cluster?
> >>>> On
> >>>>> local node we can see cluster wide caches (replicated and
> distributed)
> >>>> and
> >>>>> caches for current node only. Local caches list may differ from node
> >>> to
> >>>>> node. Which prefix should be used for this view? And one more, there
> >>> is
> >>>> no
> >>>>> sense for some views to make them cluster wide (for example
> >>>>> INGNITE_INSTANCE). Should we name it LOCAL_INSTANCE without creating
> >>>>> INSTANCE view?
> >>>>>
> >>>>> So, next steps: split PR, change schema name (IGNITE?), change view
> >>> name
> >>>>> for caches (CACHES, LOCAL_CACHES?)
> >>>>>
> >>>>>
> >>>>> 2018-01-24 13:03 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> >>>>>
> >>>>>> Hi Alex,
> >>>>>>
> >>>>>> System views could be extremely valuable addition for Ignite.
> >>> Ideally,
> >>>>> user
> >>>>>> should be able to monitor and manage state of the whole cluster
> >>> with a
> >>>>>> single SQL command line. We have plans to implement it for a very
> >>> long
> >>>>>> time. However, this is very sensitive task which should take a lot
> >>> of
> >>>>>> moving pieces in count, such as usability, consistency, performance,
> >>>>>> security, etc..
> >>>>>>
> >>>>>> Let me point several major concerns I see at the moment:
> >>>>>>
> >>>>>> 1) Usability: INFORMATION_SCHEMA
> >>>>>> This schema is part of SQL ANSI standard. When creating system
> >>> views,
> >>>>> some
> >>>>>> vendors prefer to store them in completely different predefined
> >>> schema
> >>>>>> (Oracle, MS SQL). Others prefer to keep them in INFORMATION_SCHEMA
> >>>>>> directly. Both approaches could work. However, the latter breaks
> >>>>> separation
> >>>>>> of concerns - we store typical metadata near to possibly sensitive
> >>>> system
> >>>>>> data. Also it makes security management more complex - system data
> >>> is
> >>>>> very
> >>>>>> sensitive, and now we cannot simply grant access
> >>> INFORMATIONAL_SCHEMA
> >>>> to
> >>>>>> user. Instead, we have to grant that access on per-view basis. For
> >>> this
> >>>>>> reason my preference is to store system tables in separate schema,
> >>> not
> >>>> in
> >>>>>> INFORMATION_SCHEMA
> >>>>>>
> >>>>>> 2) Consistency: local data
> >>>>>> One of implemented view GridH2SysViewImplInstance. Normally SQL
> >>> users
> >>>>>> communicate with Ignite through JDBC/ODBC drivers. These drivers are
> >>>>>> connected to a single node, typically client node. Moreover, we will
> >>>>>> introduce high-availability feature when drivers were able to
> >>> connect
> >>>> to
> >>>>>> any address from a predefined list. It renders this view useless, as
> >>>> you
> >>>>> do
> >>>>>> not know which node you connected to. Also, local-only data cannot
> >>> be
> >>>>>> joined in general case - you will receive different results on
> >>>> different
> >>>>>> nodes. The same goes for transactions, JVM info, etc.
> >>>>>>
> >>>>>> 3) Performance
> >>>>>> Suppose we fixed consistency of transactions and now this view shows
> >>>>>> transactions in the whole cluster with possibility to filter them by
> >>>>> nodes
> >>>>>> - this is what user would expect out of the box. Another problem
> >>>> appears
> >>>>>> then - performance. How would we collect necessary data? How would
> >>> we
> >>>>>> handle joins, when particular view could be scanned multiple times
> >>>> during
> >>>>>> query execution? How we achieve sensible consistency? Most probably
> >>> we
> >>>>>> would collect remote data once when query is started, cache it
> >>> somehow
> >>>> on
> >>>>>> query session level, and then re-use during joins. But again, this
> >>>> should
> >>>>>> be discussed separately.
> >>>>>>
> >>>>>> 4) Security: JVM info
> >>>>>> We should define clear boundaries of what info is exposed. JVM data
> >>>> along
> >>>>>> with running threads is critically sensitive information. We should
> >>> not
> >>>>>> expose it until we have authorization capabilities.
> >>>>>>
> >>>>>> In order to start moving this code from prototype to production
> >>> state
> >>>> we
> >>>>>> should start with the most simple and consistent views. E.g.
> >>>>> IGNITE_CACHES.
> >>>>>> Let's move it to a separate PR, review infrastructure code, review
> >>> view
> >>>>>> implementation, agree on proper naming and placement, and merge it.
> >>>> Then
> >>>>>> each and every view (or group of related views) should be discussed
> >>> and
> >>>>>> reviewed separately.
> >>>>>>
> >>>>>> As far as node-local stuff, may be we should move it to a separate
> >>>>> schema,
> >>>>>> or mark with special prefix. E.g. "IGNITE.TRANSACTIONS" - all
> >>>>> transactions
> >>>>>> in the cluster, "IGNITE.LOCAL_TRANSACTIONS" - transactions on the
> >>> local
> >>>>>> node. In this case we will be able to merge "local" stuff shortly,
> >>> and
> >>>>>> implement more complex but at the same time much more useful
> >>>> distributed
> >>>>>> stuff later on.
> >>>>>>
> >>>>>> Makes sense?
> >>>>>>
> >>>>>> Vladimir.
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jan 23, 2018 at 8:30 PM, Alex Plehanov <
> >>>> plehanov.a...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hello, Igniters!
> >>>>>>>
> >>>>>>> For Ignite diagnostic usually it’s helpful to get some Ignite
> >>>> internals
> >>>>>>> information. But currently, in my opinion, there are no convenient
> >>>>> tools
> >>>>>>> for this purpose:
> >>>>>>>
> >>>>>>> ·        Some issues can be solved by analyzing log files. Log
> >>> files
> >>>>> are
> >>>>>>> useful for dumps, but sometimes they are difficult to read. Also
> >>>>>>> interesting metrics can’t be received runtime by request, we need
> >>> to
> >>>>> wait
> >>>>>>> until Ignite will write these metrics by timeout or other events.
> >>>>>>>
> >>>>>>> ·        JMX is useful for scalar metrics. Complex and table data
> >>> can
> >>>>>> also
> >>>>>>> be received, but it’s difficult to read, filter and sort them
> >>> without
> >>>>>>> processing by specialized external tools. For most frequently used
> >>>>> cases
> >>>>>>> almost duplicating metrics are created to show data in an
> >>>> easy-to-read
> >>>>>>> form.
> >>>>>>>
> >>>>>>> ·        Web-console is able to show table and complex data.
> >>> Perhaps,
> >>>>>>> someday  web-console will contain all necessary dashboards for
> >>> most
> >>>>>> problem
> >>>>>>> investigation, but some non-trivial queries will not be covered
> >>>> anyway.
> >>>>>>> Also web-console needs additional infrastructure to work.
> >>>>>>>
> >>>>>>> ·        External “home-made” tools can be used for non-trivial
> >>>> cases.
> >>>>>> They
> >>>>>>> cover highly specialized cases and usually can’t be used as
> >>> general
> >>>>>> purpose
> >>>>>>> tools.
> >>>>>>>
> >>>>>>> Sometimes we are forced to use more than one tool and join data by
> >>>>> hands
> >>>>>>> (for example, current thread dump and data from logs).
> >>>>>>>
> >>>>>>> Often RDBMS for diagnostic purposes provides system views (for
> >>>> example,
> >>>>>>> DBA_% and V$% in Oracle), which can be queried by SQL. This
> >>> solution
> >>>>>> makes
> >>>>>>> all internal diagnostic information available in a readable form
> >>>> (with
> >>>>>> all
> >>>>>>> possible filters and projections) without using any other
> >>> internal or
> >>>>>>> external tools. My proposal is to create similar system views in
> >>>>> Ignite.
> >>>>>>>
> >>>>>>> I implement working prototype (PR: [1]). It contains views:
> >>>>>>>
> >>>>>>> IGNITE_SYSTEM_VIEWS
> >>>>>>>
> >>>>>>> Registered system views
> >>>>>>>
> >>>>>>> IGNITE_INSTANCE
> >>>>>>>
> >>>>>>> Ignite instance
> >>>>>>>
> >>>>>>> IGNITE_JVM_THREADS
> >>>>>>>
> >>>>>>> JVM threads
> >>>>>>>
> >>>>>>> IGNITE_JVM_RUNTIME
> >>>>>>>
> >>>>>>> JVM runtime
> >>>>>>>
> >>>>>>> IGNITE_JVM_OS
> >>>>>>>
> >>>>>>> JVM operating system
> >>>>>>>
> >>>>>>> IGNITE_CACHES
> >>>>>>>
> >>>>>>> Ignite caches
> >>>>>>>
> >>>>>>> IGNITE_CACHE_CLUSTER_METRICS
> >>>>>>>
> >>>>>>> Ignite cache cluster metrics
> >>>>>>>
> >>>>>>> IGNITE_CACHE_NODE_METRICS
> >>>>>>>
> >>>>>>> Ignite cache node metrics
> >>>>>>>
> >>>>>>> IGNITE_CACHE_GROUPS
> >>>>>>>
> >>>>>>> Cache groups
> >>>>>>>
> >>>>>>> IGNITE_NODES
> >>>>>>>
> >>>>>>> Nodes in topology
> >>>>>>>
> >>>>>>> IGNITE_NODE_HOSTS
> >>>>>>>
> >>>>>>> Node hosts
> >>>>>>>
> >>>>>>> IGNITE_NODE_ADDRESSES
> >>>>>>>
> >>>>>>> Node addresses
> >>>>>>>
> >>>>>>> IGNITE_NODE_ATTRIBUTES
> >>>>>>>
> >>>>>>> Node attributes
> >>>>>>>
> >>>>>>> IGNITE_NODE_METRICS
> >>>>>>>
> >>>>>>> Node metrics
> >>>>>>>
> >>>>>>> IGNITE_TRANSACTIONS
> >>>>>>>
> >>>>>>> Active transactions
> >>>>>>>
> >>>>>>> IGNITE_TRANSACTION_ENTRIES
> >>>>>>>
> >>>>>>> Cache entries used by transaction
> >>>>>>>
> >>>>>>> IGNITE_TASKS
> >>>>>>>
> >>>>>>> Active tasks
> >>>>>>>
> >>>>>>> IGNITE_PART_ASSIGNMENT
> >>>>>>>
> >>>>>>> Partition assignment map
> >>>>>>>
> >>>>>>> IGNITE_PART_ALLOCATION
> >>>>>>>
> >>>>>>> Partition allocation map
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> There are much more useful views can be implemented (executors
> >>>>>> diagnostic,
> >>>>>>> SPIs diagnostic, etc).
> >>>>>>>
> >>>>>>> Some usage examples:
> >>>>>>>
> >>>>>>> Cache groups and their partitions, which used by transaction more
> >>>> than
> >>>>> 5
> >>>>>>> minutes long:
> >>>>>>>
> >>>>>>> SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS
> >>>>> ENTITIES_CNT
> >>>>>>> FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
> >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID =
> >>>>> te.XID
> >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME
> >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID =
> >>> cg.ID
> >>>>>>> WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
> >>>>>>> GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Average CPU load on server nodes grouped by operating system:
> >>>>>>>
> >>>>>>> SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
> >>>>>>> FROM INFORMATION_SCHEMA.IGNITE_NODES n
> >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID =
> >>>> n.ID
> >>>>>> AND
> >>>>>>> na.NAME = 'os.name'
> >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID =
> >>> n.ID
> >>>>>>> WHERE n.IS_CLIENT = false
> >>>>>>> GROUP BY na.VALUE
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Top 5 nodes by puts to cache ‘cache’:
> >>>>>>>
> >>>>>>> SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
> >>>>>>> INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
> >>>>>>> WHERE cm.CACHE_NAME = 'cache'
> >>>>>>> ORDER BY cm.CACHE_PUTS DESC
> >>>>>>> LIMIT 5
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Does this implementation interesting to someone else? Maybe any
> >>> views
> >>>>> are
> >>>>>>> redundant? Which additional first-priority views must be
> >>> implemented?
> >>>>> Any
> >>>>>>> other thoughts or proposal?
> >>>>>>>
> >>>>>>> [1] https://github.com/apache/ignite/pull/3413
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
>
>

Re: Ignite diagnostic (SQL system views)

Reply via email to