Re: Ignite diagnostic (SQL system views)

Alex Plehanov Mon, 12 Feb 2018 04:57:54 -0800

The views engine and the first view are almost ready to merge (review
comments are resolved). Which views should we take next? My proposal -
NODES, NODE_ATTRIBUTES, NODE_METRICS, NODE_HOSTS and NODE_ADDRESSES, since
these views are clear and all topology data available on each node.
Any objections?


2018-01-25 16:27 GMT+03:00 Alex Plehanov <plehanov.a...@gmail.com>:

> Anton, Vladimir, I've made some fixes. There is only one view left and
> it's renamed to 'IGNITE.LOCAL_TRANSACTIONS'.
>
> High level design of solution:
> When IgniteH2Indexing is starting, it create and start
> new GridH2SysViewProcessor, which create and register in H2 (via its own
> table engine) all implementations of system views. Each system view
> implementation extends base abstract class GridH2SysView. View
> implementation describes columns, their types and indexes in constructor
> and must override method getRows for data retrieval (this method called by
> H2-compatible table and index implementations for ignite system views).
> Almost no fixes to existing parsing engine was made, except some places,
> where GridH2Table instance was expected, but for system views there is
> another class.
>
> New PR: [1].  Please have a look.
>
> [1] https://github.com/apache/ignite/pull/3433
>
> 2018-01-24 19:12 GMT+03:00 Anton Vinogradov <avinogra...@gridgain.com>:
>
>> I've created IEP-13 [1] to cover all cases.
>> Feel free to create issues.
>>
>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75962769
>>
>> On Wed, Jan 24, 2018 at 6:10 PM, Vladimir Ozerov <voze...@gridgain.com>
>> wrote:
>>
>> > Let's start with a single and the most simple view, e.g.
>> > LOCAL_TRANSACTIONS. We will review and merge it along with necessary
>> > infrastructure. Then will handle the rest view in separate tickets and
>> > separate focused discussions.
>> >
>> > On Wed, Jan 24, 2018 at 5:29 PM, Alex Plehanov <plehanov.a...@gmail.com
>> >
>> > wrote:
>> >
>> > > 1) It’s not a principal point, I can change schema. The
>> > INFORMATION_SCHEMA
>> > > was used because it’s already exists and usually used for metadata
>> tables
>> > > and views. Your proposal is to use schema “IGNITE”, am I understand
>> you
>> > > right? BTW, for now, we can’t query another (H2) meta tables from the
>> > > INFORMATION_SCHEMA, so, “Ignite system views” is only available views
>> to
>> > > query from this schema.
>> > > 2) Exactly for this reason the IGNITE_INSTANCE view is useful: to
>> > determine
>> > > which node we are connected to.
>> > > 3) As the first phase, in my opinion, local views will be enough.
>> > > Performance and caching of distributed views should be discussed at
>> next
>> > > phases, when distributed views implementation will be planned. In
>> current
>> > > implementation I tried to use indexing for local views wherever it’s
>> > > possible.
>> > > 4) I don’t think, that JVM info is more critical information than, for
>> > > example, caches or nodes information. When authorization capabilities
>> > > planned to implement?
>> > >
>> > > About local data: yes, we can rename all currently implemented views
>> for
>> > > the local node data as LOCAL_..., and create (someday) new whole
>> cluster
>> > > views (which use distributed requests) without prefix or, for example,
>> > with
>> > > CLUSTER_ prefix. But some views can show all cluster information using
>> > only
>> > > local node data, without distributed requests (for example
>> > > IGNITE_NODE_METRICS, IGNITE_PART_ASSIGNMENT, IGNITE_PART_ALLOCATION,
>> > > IGNITE_NODES, etc). Are they local or cluster views in this concept?
>> > Which
>> > > prefix should be used? And what about caches? Are they local or
>> cluster?
>> > On
>> > > local node we can see cluster wide caches (replicated and distributed)
>> > and
>> > > caches for current node only. Local caches list may differ from node
>> to
>> > > node. Which prefix should be used for this view? And one more, there
>> is
>> > no
>> > > sense for some views to make them cluster wide (for example
>> > > INGNITE_INSTANCE). Should we name it LOCAL_INSTANCE without creating
>> > > INSTANCE view?
>> > >
>> > > So, next steps: split PR, change schema name (IGNITE?), change view
>> name
>> > > for caches (CACHES, LOCAL_CACHES?)
>> > >
>> > >
>> > > 2018-01-24 13:03 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
>> > >
>> > > > Hi Alex,
>> > > >
>> > > > System views could be extremely valuable addition for Ignite.
>> Ideally,
>> > > user
>> > > > should be able to monitor and manage state of the whole cluster
>> with a
>> > > > single SQL command line. We have plans to implement it for a very
>> long
>> > > > time. However, this is very sensitive task which should take a lot
>> of
>> > > > moving pieces in count, such as usability, consistency, performance,
>> > > > security, etc..
>> > > >
>> > > > Let me point several major concerns I see at the moment:
>> > > >
>> > > > 1) Usability: INFORMATION_SCHEMA
>> > > > This schema is part of SQL ANSI standard. When creating system
>> views,
>> > > some
>> > > > vendors prefer to store them in completely different predefined
>> schema
>> > > > (Oracle, MS SQL). Others prefer to keep them in INFORMATION_SCHEMA
>> > > > directly. Both approaches could work. However, the latter breaks
>> > > separation
>> > > > of concerns - we store typical metadata near to possibly sensitive
>> > system
>> > > > data. Also it makes security management more complex - system data
>> is
>> > > very
>> > > > sensitive, and now we cannot simply grant access
>> INFORMATIONAL_SCHEMA
>> > to
>> > > > user. Instead, we have to grant that access on per-view basis. For
>> this
>> > > > reason my preference is to store system tables in separate schema,
>> not
>> > in
>> > > > INFORMATION_SCHEMA
>> > > >
>> > > > 2) Consistency: local data
>> > > > One of implemented view GridH2SysViewImplInstance. Normally SQL
>> users
>> > > > communicate with Ignite through JDBC/ODBC drivers. These drivers are
>> > > > connected to a single node, typically client node. Moreover, we will
>> > > > introduce high-availability feature when drivers were able to
>> connect
>> > to
>> > > > any address from a predefined list. It renders this view useless, as
>> > you
>> > > do
>> > > > not know which node you connected to. Also, local-only data cannot
>> be
>> > > > joined in general case - you will receive different results on
>> > different
>> > > > nodes. The same goes for transactions, JVM info, etc.
>> > > >
>> > > > 3) Performance
>> > > > Suppose we fixed consistency of transactions and now this view shows
>> > > > transactions in the whole cluster with possibility to filter them by
>> > > nodes
>> > > > - this is what user would expect out of the box. Another problem
>> > appears
>> > > > then - performance. How would we collect necessary data? How would
>> we
>> > > > handle joins, when particular view could be scanned multiple times
>> > during
>> > > > query execution? How we achieve sensible consistency? Most probably
>> we
>> > > > would collect remote data once when query is started, cache it
>> somehow
>> > on
>> > > > query session level, and then re-use during joins. But again, this
>> > should
>> > > > be discussed separately.
>> > > >
>> > > > 4) Security: JVM info
>> > > > We should define clear boundaries of what info is exposed. JVM data
>> > along
>> > > > with running threads is critically sensitive information. We should
>> not
>> > > > expose it until we have authorization capabilities.
>> > > >
>> > > > In order to start moving this code from prototype to production
>> state
>> > we
>> > > > should start with the most simple and consistent views. E.g.
>> > > IGNITE_CACHES.
>> > > > Let's move it to a separate PR, review infrastructure code, review
>> view
>> > > > implementation, agree on proper naming and placement, and merge it.
>> > Then
>> > > > each and every view (or group of related views) should be discussed
>> and
>> > > > reviewed separately.
>> > > >
>> > > > As far as node-local stuff, may be we should move it to a separate
>> > > schema,
>> > > > or mark with special prefix. E.g. "IGNITE.TRANSACTIONS" - all
>> > > transactions
>> > > > in the cluster, "IGNITE.LOCAL_TRANSACTIONS" - transactions on the
>> local
>> > > > node. In this case we will be able to merge "local" stuff shortly,
>> and
>> > > > implement more complex but at the same time much more useful
>> > distributed
>> > > > stuff later on.
>> > > >
>> > > > Makes sense?
>> > > >
>> > > > Vladimir.
>> > > >
>> > > >
>> > > > On Tue, Jan 23, 2018 at 8:30 PM, Alex Plehanov <
>> > plehanov.a...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hello, Igniters!
>> > > > >
>> > > > > For Ignite diagnostic usually it’s helpful to get some Ignite
>> > internals
>> > > > > information. But currently, in my opinion, there are no convenient
>> > > tools
>> > > > > for this purpose:
>> > > > >
>> > > > > ·        Some issues can be solved by analyzing log files. Log
>> files
>> > > are
>> > > > > useful for dumps, but sometimes they are difficult to read. Also
>> > > > > interesting metrics can’t be received runtime by request, we need
>> to
>> > > wait
>> > > > > until Ignite will write these metrics by timeout or other events.
>> > > > >
>> > > > > ·        JMX is useful for scalar metrics. Complex and table data
>> can
>> > > > also
>> > > > > be received, but it’s difficult to read, filter and sort them
>> without
>> > > > > processing by specialized external tools. For most frequently used
>> > > cases
>> > > > > almost duplicating metrics are created to show data in an
>> > easy-to-read
>> > > > > form.
>> > > > >
>> > > > > ·        Web-console is able to show table and complex data.
>> Perhaps,
>> > > > > someday  web-console will contain all necessary dashboards for
>> most
>> > > > problem
>> > > > > investigation, but some non-trivial queries will not be covered
>> > anyway.
>> > > > > Also web-console needs additional infrastructure to work.
>> > > > >
>> > > > > ·        External “home-made” tools can be used for non-trivial
>> > cases.
>> > > > They
>> > > > > cover highly specialized cases and usually can’t be used as
>> general
>> > > > purpose
>> > > > > tools.
>> > > > >
>> > > > > Sometimes we are forced to use more than one tool and join data by
>> > > hands
>> > > > > (for example, current thread dump and data from logs).
>> > > > >
>> > > > > Often RDBMS for diagnostic purposes provides system views (for
>> > example,
>> > > > > DBA_% and V$% in Oracle), which can be queried by SQL. This
>> solution
>> > > > makes
>> > > > > all internal diagnostic information available in a readable form
>> > (with
>> > > > all
>> > > > > possible filters and projections) without using any other
>> internal or
>> > > > > external tools. My proposal is to create similar system views in
>> > > Ignite.
>> > > > >
>> > > > > I implement working prototype (PR: [1]). It contains views:
>> > > > >
>> > > > > IGNITE_SYSTEM_VIEWS
>> > > > >
>> > > > > Registered system views
>> > > > >
>> > > > > IGNITE_INSTANCE
>> > > > >
>> > > > > Ignite instance
>> > > > >
>> > > > > IGNITE_JVM_THREADS
>> > > > >
>> > > > > JVM threads
>> > > > >
>> > > > > IGNITE_JVM_RUNTIME
>> > > > >
>> > > > > JVM runtime
>> > > > >
>> > > > > IGNITE_JVM_OS
>> > > > >
>> > > > > JVM operating system
>> > > > >
>> > > > > IGNITE_CACHES
>> > > > >
>> > > > > Ignite caches
>> > > > >
>> > > > > IGNITE_CACHE_CLUSTER_METRICS
>> > > > >
>> > > > > Ignite cache cluster metrics
>> > > > >
>> > > > > IGNITE_CACHE_NODE_METRICS
>> > > > >
>> > > > > Ignite cache node metrics
>> > > > >
>> > > > > IGNITE_CACHE_GROUPS
>> > > > >
>> > > > > Cache groups
>> > > > >
>> > > > > IGNITE_NODES
>> > > > >
>> > > > > Nodes in topology
>> > > > >
>> > > > > IGNITE_NODE_HOSTS
>> > > > >
>> > > > > Node hosts
>> > > > >
>> > > > > IGNITE_NODE_ADDRESSES
>> > > > >
>> > > > > Node addresses
>> > > > >
>> > > > > IGNITE_NODE_ATTRIBUTES
>> > > > >
>> > > > > Node attributes
>> > > > >
>> > > > > IGNITE_NODE_METRICS
>> > > > >
>> > > > > Node metrics
>> > > > >
>> > > > > IGNITE_TRANSACTIONS
>> > > > >
>> > > > > Active transactions
>> > > > >
>> > > > > IGNITE_TRANSACTION_ENTRIES
>> > > > >
>> > > > > Cache entries used by transaction
>> > > > >
>> > > > > IGNITE_TASKS
>> > > > >
>> > > > > Active tasks
>> > > > >
>> > > > > IGNITE_PART_ASSIGNMENT
>> > > > >
>> > > > > Partition assignment map
>> > > > >
>> > > > > IGNITE_PART_ALLOCATION
>> > > > >
>> > > > > Partition allocation map
>> > > > >
>> > > > >
>> > > > >
>> > > > > There are much more useful views can be implemented (executors
>> > > > diagnostic,
>> > > > > SPIs diagnostic, etc).
>> > > > >
>> > > > > Some usage examples:
>> > > > >
>> > > > > Cache groups and their partitions, which used by transaction more
>> > than
>> > > 5
>> > > > > minutes long:
>> > > > >
>> > > > > SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS
>> > > ENTITIES_CNT
>> > > > > FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
>> > > > > JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID =
>> > > te.XID
>> > > > > JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME
>> > > > > JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID =
>> cg.ID
>> > > > > WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
>> > > > > GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION
>> > > > >
>> > > > >
>> > > > >
>> > > > > Average CPU load on server nodes grouped by operating system:
>> > > > >
>> > > > > SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
>> > > > > FROM INFORMATION_SCHEMA.IGNITE_NODES n
>> > > > > JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID =
>> > n.ID
>> > > > AND
>> > > > > na.NAME = 'os.name'
>> > > > > JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID =
>> n.ID
>> > > > > WHERE n.IS_CLIENT = false
>> > > > > GROUP BY na.VALUE
>> > > > >
>> > > > >
>> > > > >
>> > > > > Top 5 nodes by puts to cache ‘cache’:
>> > > > >
>> > > > > SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
>> > > > > INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
>> > > > > WHERE cm.CACHE_NAME = 'cache'
>> > > > > ORDER BY cm.CACHE_PUTS DESC
>> > > > > LIMIT 5
>> > > > >
>> > > > >
>> > > > >
>> > > > > Does this implementation interesting to someone else? Maybe any
>> views
>> > > are
>> > > > > redundant? Which additional first-priority views must be
>> implemented?
>> > > Any
>> > > > > other thoughts or proposal?
>> > > > >
>> > > > > [1] https://github.com/apache/ignite/pull/3413
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Ignite diagnostic (SQL system views)

Reply via email to