Hi Kirk, There are definitely a lot of work in the HDFS based replacement. First task is to redesign the ChukwaRecord storage with index. This could be done using Tfile+Avro. HICC will be converted to read Tfile+Avro and transform the output to JSON using Jersey. The over all design is in CHUKWA-444. Most of the discussion should happen in the JIRA or chukwa-dev mailing list.
Regards, Eric On 3/18/10 8:59 PM, "Kirk True" <k...@mustardgrain.com> wrote: > Hi Eric, > > Awesome - everything's working great now. > > So, as you've said, the SQL portion of Chukwa is deprecated, and the > HDFS-based replacement is six months out. What should I do to get the data > from the adapters->collectors->HDFS->HICC? Is the HDFS-based HICC replacement > spec'ed out enough for others to contribute? > > Thanks, > Kirk > > Eric Yang wrote: >> >> Hi Kirk, >> >> 1. Host select is currently showing hostname collected from SystemMetrics >> table, hence, you need to have top, iostat, df, sar collected to populate >> SystemMetrics table correctly. The hostname is also cached in the user >> session, hence you will need to “switch to a different cluster, and switch >> back” or restart hicc to flush the cached hostnames from user session. The >> hostname selector probably should pickup hostname from a different data >> source in the future release. >> >> 2. The server should run in UTC. Timezone was never implemented >> completely. Hence, server in other timezone will not work correctly. >> >> 3. SQL aggregator (deprecated by the way) running as part of dbAdmin.sh, >> this subsystem will down sample data from weekly table to monthly, yearly, >> decade tables. I wrote this submodule over a weekend for prototype show and >> tell. I strongly recommend to avoid SQL part of Chukwa all together. >> >> Regards, >> Eric >> >> On 3/18/10 1:15 PM, "Kirk True" <k...@mustardgrain.com> >> <mailto:k...@mustardgrain.com> wrote: >> >> >> >>> >>> Hi Eric, >>> >>> I believe I have most of steps 1-5 working. Data from "/usr/bin/df" is being >>> collected, parsed, stuck into HDFS, and then pulled out again and placed >>> into >>> MySQL. However, HICC isn't showing me my data just yet... >>> >>> The disk_2098_week table is filled out with several entries and looks great. >>> If I select my cluster from the "Cluster Selector" and "Last 12 Hours" from >>> the "Time" widget, the "Disk Statistics" widget still says "No Data >>> available." >>> >>> It appears to be because part of the SQL query includes the host name which >>> is >>> coming across in the SQL parameters as "". However, since the disk_2098_week >>> table properly includes the host name, nothing is returned by the query. >>> Just >>> for grins, I updated the table manually in MySQL to blank out the host names >>> and I get a super cool, pretty graph (which looks great, BTW). >>> >>> Additionally, if I select other time periods such as "Last 1 Hour", I see >>> the >>> query is using UTC or something (at 1:00 PDT, I see the query is using a >>> range >>> of 19:00-20:00). However, the data in MySQL is based on PDT, so no matches >>> are >>> found. It appears that the "time_zone" session attribute contains the value >>> "UTC". Where is this coming from and how can I change it? >>> >>> Problems: >>> >>> 1. How do I get the "Hosts Selector" in HICC to include my host name so that >>> the generated SQL queries are correct? >>> 2. How do I make the "time_zone" session parameter use PDT vs. UTC? >>> 3. How do I populate the other tables, such as "disk_489_month"? >>> >>> Thanks, >>> Kirk >>> >>> Eric Yang wrote: >>> >>> >>>> >>>> >>>> Df command is converted into disk_xxxx_week table in mysql, if I remember >>>> correctly. In mysql are the database tables getting created? >>>> Make sure that you have: >>>> >>>> <property> >>>> <name>chukwa.post.demux.data.loader</name> >>>> >>>> <value>org.apache.hadoop.chukwa.dataloader.MetricDataLoaderPool,org.apache.>>>> h >>>> adoop.chukwa.dataloader.FSMDataLoader</value> >>>> </property> >>>> >>>> In Chukwa-demux.conf. >>>> >>>> The rough picture of the data flows looks like this: >>>> >>>> 1. demux -> Generate chukwa record outputs. >>>> 2. archive -> Generate bigger files by compacting data sink files. >>>> (Concurrent with step 1) >>>> 3. postProcess -> Look up what files are generated by demux process and >>>> dispatch using different data loaders. >>>> 4. MetricDataLoaderPool -> Dispatch multiple threads to load chukwa >>>> record files to different MDL. >>>> 5. MetricDataLoader -> Load sequence file to database by record type >>>> defined in mdl.xml. >>>> 6. HICC widget has a descriptor language in json. You can find the widget >>>> descriptor files in hdfs://namenode:port/chukwa/hicc/widgets which >>>> embedded the full SQL template like: >>>> >>>> Query=²select cpu_user_pcnt from [system_metrics] where timestamp >>>> between >>>> [start] and [end]² >>>> >>>> This will output everything the metrics in JSON format and the HICC >>>> graphing widget will render the graph. >>>> >>>> If there is no data, look at postProcess.log and make sure the data loading >>>> is not throwing exceptions. Step 3 to 6 are deprecated, and will be >>>> replaced with something else. Hope this helps. >>>> >>>> Regards, >>>> Eric >>>> >>>> On 3/17/10 4:16 PM, "Kirk True" <k...@mustardgrain.com> >>>> <mailto:k...@mustardgrain.com> >>>> <mailto:k...@mustardgrain.com> wrote: >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> >>>>> Hi Eric, >>>>> >>>>> Eric Yang wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> Hi Kirk, >>>>>> >>>>>> I am working on a design which removes MySQL from Chukwa. I am making >>>>>> this >>>>>> departure from MySQL because MDL framework was for prototype purpose. It >>>>>> will not scale in production system where Chukwa could be host on large >>>>>> hadoop cluster. HICC will serve data directly from HDFS in the future. >>>>>> >>>>>> Meanwhile, the dbAdmin.sh from Chukwa 0.3 is still compatible with trunk >>>>>> version of Chukwa. You can load ChukwaRecords using >>>>>> org.apache.hadoop.chukwa.dataloader.MetricDataLoader class or mdl.sh from >>>>>> Chukwa 0.3. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> I'm to the point where the "df" example is working and demux is storing >>>>> ChukwaRecord data in HDFS. When I run dbAdmin.sh from 0.3.0, no data is >>>>> getting updated in the database. >>>>> >>>>> My question is: what's the process to get a custom Demux implementation to >>>>> be >>>>> viewable in HICC? Are the database tables magically created and populated >>>>> for >>>>> me? Does HICC generate a widget for me? >>>>> >>>>> HICC looks very nice, but when I try to add a widget to my dashboard, the >>>>> preview always reads, "No Data Available." I'm running >>>>> $CHUKWA_HOME/bin/start-all.sh followed by $CHUKWA_HOME/bin/dbAdmin.sh >>>>> (which >>>>> I've manually copied to the bin directory). >>>>> >>>>> What am I missing? >>>>> >>>>> Thanks, >>>>> Kirk >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> MetricDataLoader class will be mark as deprecated, and it will not be >>>>>> supported once we make transition to Avro + Tfile. >>>>>> >>>>>> Regards, >>>>>> Eric >>>>>> >>>>>> On 3/15/10 11:56 AM, "Kirk True" <k...@mustardgrain.com> >>>>>> <mailto:k...@mustardgrain.com> >>>>>> <mailto:k...@mustardgrain.com> >>>>>> <mailto:k...@mustardgrain.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I recently switched to trunk as I was experiencing a lot of issues with >>>>>>> 0.3.0. In 0.3.0, there was a dbAdmin.sh script that would run and try to >>>>>>> stick data in MySQL from HDFS. However, that script is gone and when I >>>>>>> run the system as built from trunk, nothing is ever populated in the >>>>>>> database. Where are the instructions for setting up the HDFS -> MySQL >>>>>>> data migration for HICC? >>>>>>> >>>>>>> Thanks, >>>>>>> Kirk >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> >> >