Hi Eric, Awesome - everything's working great now.
So, as you've said, the SQL portion of Chukwa is deprecated, and the HDFS-based replacement is six months out. What should I do to get the data from the adapters->collectors->HDFS->HICC? Is the HDFS-based HICC replacement spec'ed out enough for others to contribute? Thanks, Kirk Eric Yang wrote: > Hi Kirk, > > 1. Host select is currently showing hostname collected from SystemMetrics > table, hence, you need to have top, iostat, df, sar collected to populate > SystemMetrics table correctly. The hostname is also cached in the user > session, hence you will need to “switch to a different cluster, and switch > back” or restart hicc to flush the cached hostnames from user session. The > hostname selector probably should pickup hostname from a different data > source in the future release. > > 2. The server should run in UTC. Timezone was never implemented > completely. Hence, server in other timezone will not work correctly. > > 3. SQL aggregator (deprecated by the way) running as part of dbAdmin.sh, > this subsystem will down sample data from weekly table to monthly, yearly, > decade tables. I wrote this submodule over a weekend for prototype show and > tell. I strongly recommend to avoid SQL part of Chukwa all together. > > Regards, > Eric > > On 3/18/10 1:15 PM, "Kirk True" <k...@mustardgrain.com> wrote: > > >> Hi Eric, >> >> I believe I have most of steps 1-5 working. Data from "/usr/bin/df" is being >> collected, parsed, stuck into HDFS, and then pulled out again and placed into >> MySQL. However, HICC isn't showing me my data just yet... >> >> The disk_2098_week table is filled out with several entries and looks great. >> If I select my cluster from the "Cluster Selector" and "Last 12 Hours" from >> the "Time" widget, the "Disk Statistics" widget still says "No Data >> available." >> >> It appears to be because part of the SQL query includes the host name which >> is >> coming across in the SQL parameters as "". However, since the disk_2098_week >> table properly includes the host name, nothing is returned by the query. Just >> for grins, I updated the table manually in MySQL to blank out the host names >> and I get a super cool, pretty graph (which looks great, BTW). >> >> Additionally, if I select other time periods such as "Last 1 Hour", I see the >> query is using UTC or something (at 1:00 PDT, I see the query is using a >> range >> of 19:00-20:00). However, the data in MySQL is based on PDT, so no matches >> are >> found. It appears that the "time_zone" session attribute contains the value >> "UTC". Where is this coming from and how can I change it? >> >> Problems: >> >> 1. How do I get the "Hosts Selector" in HICC to include my host name so that >> the generated SQL queries are correct? >> 2. How do I make the "time_zone" session parameter use PDT vs. UTC? >> 3. How do I populate the other tables, such as "disk_489_month"? >> >> Thanks, >> Kirk >> >> Eric Yang wrote: >> >>> >>> Df command is converted into disk_xxxx_week table in mysql, if I remember >>> correctly. In mysql are the database tables getting created? >>> Make sure that you have: >>> >>> <property> >>> <name>chukwa.post.demux.data.loader</name> >>> >>> <value>org.apache.hadoop.chukwa.dataloader.MetricDataLoaderPool,org.apache.h >>> adoop.chukwa.dataloader.FSMDataLoader</value> >>> </property> >>> >>> In Chukwa-demux.conf. >>> >>> The rough picture of the data flows looks like this: >>> >>> 1. demux -> Generate chukwa record outputs. >>> 2. archive -> Generate bigger files by compacting data sink files. >>> (Concurrent with step 1) >>> 3. postProcess -> Look up what files are generated by demux process and >>> dispatch using different data loaders. >>> 4. MetricDataLoaderPool -> Dispatch multiple threads to load chukwa >>> record files to different MDL. >>> 5. MetricDataLoader -> Load sequence file to database by record type >>> defined in mdl.xml. >>> 6. HICC widget has a descriptor language in json. You can find the widget >>> descriptor files in hdfs://namenode:port/chukwa/hicc/widgets which >>> embedded the full SQL template like: >>> >>> Query=²select cpu_user_pcnt from [system_metrics] where timestamp between >>> [start] and [end]² >>> >>> This will output everything the metrics in JSON format and the HICC >>> graphing widget will render the graph. >>> >>> If there is no data, look at postProcess.log and make sure the data loading >>> is not throwing exceptions. Step 3 to 6 are deprecated, and will be >>> replaced with something else. Hope this helps. >>> >>> Regards, >>> Eric >>> >>> On 3/17/10 4:16 PM, "Kirk True" <k...@mustardgrain.com> >>> <mailto:k...@mustardgrain.com> wrote: >>> >>> >>> >>> >>>> >>>> Hi Eric, >>>> >>>> Eric Yang wrote: >>>> >>>> >>>> >>>>> >>>>> >>>>> Hi Kirk, >>>>> >>>>> I am working on a design which removes MySQL from Chukwa. I am making >>>>> this >>>>> departure from MySQL because MDL framework was for prototype purpose. It >>>>> will not scale in production system where Chukwa could be host on large >>>>> hadoop cluster. HICC will serve data directly from HDFS in the future. >>>>> >>>>> Meanwhile, the dbAdmin.sh from Chukwa 0.3 is still compatible with trunk >>>>> version of Chukwa. You can load ChukwaRecords using >>>>> org.apache.hadoop.chukwa.dataloader.MetricDataLoader class or mdl.sh from >>>>> Chukwa 0.3. >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> I'm to the point where the "df" example is working and demux is storing >>>> ChukwaRecord data in HDFS. When I run dbAdmin.sh from 0.3.0, no data is >>>> getting updated in the database. >>>> >>>> My question is: what's the process to get a custom Demux implementation to >>>> be >>>> viewable in HICC? Are the database tables magically created and populated >>>> for >>>> me? Does HICC generate a widget for me? >>>> >>>> HICC looks very nice, but when I try to add a widget to my dashboard, the >>>> preview always reads, "No Data Available." I'm running >>>> $CHUKWA_HOME/bin/start-all.sh followed by $CHUKWA_HOME/bin/dbAdmin.sh >>>> (which >>>> I've manually copied to the bin directory). >>>> >>>> What am I missing? >>>> >>>> Thanks, >>>> Kirk >>>> >>>> >>>> >>>> >>>>> >>>>> >>>>> MetricDataLoader class will be mark as deprecated, and it will not be >>>>> supported once we make transition to Avro + Tfile. >>>>> >>>>> Regards, >>>>> Eric >>>>> >>>>> On 3/15/10 11:56 AM, "Kirk True" <k...@mustardgrain.com> >>>>> <mailto:k...@mustardgrain.com> >>>>> <mailto:k...@mustardgrain.com> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I recently switched to trunk as I was experiencing a lot of issues with >>>>>> 0.3.0. In 0.3.0, there was a dbAdmin.sh script that would run and try to >>>>>> stick data in MySQL from HDFS. However, that script is gone and when I >>>>>> run the system as built from trunk, nothing is ever populated in the >>>>>> database. Where are the instructions for setting up the HDFS -> MySQL >>>>>> data migration for HICC? >>>>>> >>>>>> Thanks, >>>>>> Kirk >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> > >