Re: How to set up HDFS -> MySQL from trunk?

Eric Yang Fri, 19 Mar 2010 12:20:12 -0700

Hi Kirk,

There are definitely a lot of work in the HDFS based replacement.  First
task is to redesign the ChukwaRecord storage with index.  This could be done
using Tfile+Avro.  HICC will be converted to read Tfile+Avro and transform
the output to JSON using Jersey.  The over all design is in CHUKWA-444.
Most of the discussion should happen in the JIRA or chukwa-dev mailing list.


Regards,
Eric


On 3/18/10 8:59 PM, "Kirk True" <k...@mustardgrain.com> wrote:

> Hi Eric,
> 
> Awesome - everything's working great now.
> 
> So, as you've said, the SQL portion of Chukwa is deprecated, and the
> HDFS-based replacement is six months out. What should I do to get the data
> from the adapters->collectors->HDFS->HICC? Is the HDFS-based HICC replacement
> spec'ed out enough for others to contribute?
> 
> Thanks,
> Kirk
> 
> Eric Yang wrote: 
>>  
>> Hi Kirk,
>> 
>> 1. Host select is currently showing hostname collected from SystemMetrics
>> table, hence, you need to have top, iostat, df, sar collected to populate
>> SystemMetrics table correctly.  The hostname is also cached in the user
>> session, hence you will need to “switch to a different cluster, and switch
>> back” or restart hicc to flush the cached hostnames from user session.  The
>> hostname selector probably should pickup hostname from a different data
>> source in the future release.
>> 
>> 2.  The server should run in UTC.  Timezone was never implemented
>> completely.  Hence, server in other timezone will not work correctly.
>> 
>> 3. SQL aggregator (deprecated by the way) running as part of dbAdmin.sh,
>> this subsystem will down sample data from weekly table to monthly, yearly,
>> decade tables.  I wrote this submodule over a weekend for prototype show and
>> tell.  I strongly recommend to avoid SQL part of Chukwa all together.
>> 
>> Regards,
>> Eric
>> 
>> On 3/18/10 1:15 PM, "Kirk True" <k...@mustardgrain.com>
>> <mailto:k...@mustardgrain.com>  wrote:
>> 
>>   
>>  
>>>  
>>> Hi Eric,
>>> 
>>> I believe I have most of steps 1-5 working. Data from "/usr/bin/df" is being
>>> collected, parsed, stuck into HDFS, and then pulled out again and placed
>>> into
>>> MySQL. However, HICC isn't showing me my data just yet...
>>> 
>>> The disk_2098_week table is filled out with several entries and looks great.
>>> If I select my cluster from the "Cluster Selector" and "Last 12 Hours" from
>>> the "Time" widget, the "Disk Statistics" widget still says "No Data
>>> available."
>>> 
>>> It appears to be because part of the SQL query includes the host name which
>>> is
>>> coming across in the SQL parameters as "". However, since the disk_2098_week
>>> table properly includes the host name, nothing is returned by the query.
>>> Just
>>> for grins, I updated the table manually in MySQL to blank out the host names
>>> and I get a super cool, pretty graph (which looks great, BTW).
>>> 
>>> Additionally, if I select other time periods such as "Last 1 Hour", I see
>>> the
>>> query is using UTC or something (at 1:00 PDT, I see the query is using a
>>> range
>>> of 19:00-20:00). However, the data in MySQL is based on PDT, so no matches
>>> are
>>> found. It appears that the "time_zone" session attribute contains the value
>>> "UTC". Where is this coming from and how can I change it?
>>> 
>>> Problems:
>>> 
>>> 1. How do I get the "Hosts Selector" in HICC to include my host name so that
>>> the generated SQL queries are correct?
>>> 2. How do I make the "time_zone" session parameter use PDT vs. UTC?
>>> 3. How do I populate the other tables, such as "disk_489_month"?
>>> 
>>> Thanks,
>>> Kirk
>>> 
>>> Eric Yang wrote:
>>>     
>>>  
>>>>  
>>>>  
>>>> Df command is converted into disk_xxxx_week table in mysql, if I remember
>>>> correctly.  In mysql are the database tables getting created?
>>>> Make sure that you have:
>>>> 
>>>>   <property>
>>>>     <name>chukwa.post.demux.data.loader</name>
>>>>     
>>>> 
<value>org.apache.hadoop.chukwa.dataloader.MetricDataLoaderPool,org.apache.>>>>
h
>>>> adoop.chukwa.dataloader.FSMDataLoader</value>
>>>>   </property>
>>>> 
>>>> In Chukwa-demux.conf.
>>>> 
>>>> The rough picture of the data flows looks like this:
>>>> 
>>>> 1. demux -> Generate chukwa record outputs.
>>>> 2. archive -> Generate bigger files by compacting data sink files.
>>>>    (Concurrent with step 1)
>>>> 3. postProcess -> Look up what files are generated by demux process and
>>>>    dispatch using different data loaders.
>>>> 4. MetricDataLoaderPool -> Dispatch multiple threads to load chukwa
>>>>    record files to different MDL.
>>>> 5. MetricDataLoader -> Load sequence file to database by record type
>>>>    defined in mdl.xml.
>>>> 6. HICC widget has a descriptor language in json.  You can find the widget
>>>>    descriptor files in hdfs://namenode:port/chukwa/hicc/widgets which
>>>>    embedded the full SQL template like:
>>>> 
>>>>    Query=²select cpu_user_pcnt from [system_metrics] where timestamp
>>>> between
>>>>    [start] and [end]²
>>>> 
>>>>    This will output everything the metrics in JSON format and the HICC
>>>>    graphing widget will render the graph.
>>>> 
>>>> If there is no data, look at postProcess.log and make sure the data loading
>>>> is not throwing exceptions.  Step 3 to 6 are deprecated, and will be
>>>> replaced with something else.  Hope this helps.
>>>> 
>>>> Regards,
>>>> Eric
>>>> 
>>>> On 3/17/10 4:16 PM, "Kirk True" <k...@mustardgrain.com>
>>>> <mailto:k...@mustardgrain.com>
>>>> <mailto:k...@mustardgrain.com>  wrote:
>>>> 
>>>>   
>>>>  
>>>>       
>>>>  
>>>>>  
>>>>>  
>>>>> Hi Eric,
>>>>> 
>>>>> Eric Yang wrote:
>>>>>     
>>>>>  
>>>>>         
>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>> Hi Kirk,
>>>>>> 
>>>>>> I am working on a design which removes MySQL from Chukwa.  I am making
>>>>>> this
>>>>>> departure from MySQL because MDL framework was for prototype purpose.  It
>>>>>> will not scale in production system where Chukwa could be host on large
>>>>>> hadoop cluster.  HICC will serve data directly from HDFS in the future.
>>>>>> 
>>>>>> Meanwhile, the dbAdmin.sh from Chukwa 0.3 is still compatible with trunk
>>>>>> version of Chukwa.  You can load ChukwaRecords using
>>>>>> org.apache.hadoop.chukwa.dataloader.MetricDataLoader class or mdl.sh from
>>>>>> Chukwa 0.3.
>>>>>> 
>>>>>>   
>>>>>>       
>>>>>>  
>>>>>>           
>>>>>>  
>>>>>  
>>>>>  
>>>>> I'm to the point where the "df" example is working and demux is storing
>>>>> ChukwaRecord data in HDFS. When I run dbAdmin.sh from 0.3.0, no data is
>>>>> getting updated in the database.
>>>>> 
>>>>> My question is: what's the process to get a custom Demux implementation to
>>>>> be
>>>>> viewable in HICC? Are the database tables magically created and populated
>>>>> for
>>>>> me? Does HICC generate a widget for me?
>>>>> 
>>>>> HICC looks very nice, but when I try to add a widget to my dashboard, the
>>>>> preview always reads, "No Data Available." I'm running
>>>>> $CHUKWA_HOME/bin/start-all.sh followed by $CHUKWA_HOME/bin/dbAdmin.sh
>>>>> (which
>>>>> I've manually copied to the bin directory).
>>>>> 
>>>>> What am I missing?
>>>>> 
>>>>> Thanks,
>>>>> Kirk
>>>>> 
>>>>>     
>>>>>  
>>>>>         
>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>> MetricDataLoader class will be mark as deprecated, and it will not be
>>>>>> supported once we make transition to Avro + Tfile.
>>>>>> 
>>>>>> Regards,
>>>>>> Eric
>>>>>> 
>>>>>> On 3/15/10 11:56 AM, "Kirk True" <k...@mustardgrain.com>
>>>>>> <mailto:k...@mustardgrain.com>
>>>>>> <mailto:k...@mustardgrain.com>
>>>>>> <mailto:k...@mustardgrain.com>  wrote:
>>>>>> 
>>>>>>   
>>>>>>  
>>>>>>       
>>>>>>  
>>>>>>           
>>>>>>  
>>>>>>>  
>>>>>>>  
>>>>>>>  
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I recently switched to trunk as I was experiencing a lot of issues with
>>>>>>> 0.3.0. In 0.3.0, there was a dbAdmin.sh script that would run and try to
>>>>>>> stick data in MySQL from HDFS. However, that script is gone and when I
>>>>>>> run the system as built from trunk, nothing is ever populated in the
>>>>>>> database. Where are the instructions for setting up the HDFS -> MySQL
>>>>>>> data migration for HICC?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Kirk
>>>>>>>     
>>>>>>>  
>>>>>>>         
>>>>>>>  
>>>>>>>            
>>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>> 
>>>>>>   
>>>>>>       
>>>>>>  
>>>>>>           
>>>>>>  
>>>>>  
>>>>>  
>>>>>         
>>>>>  
>>>>  
>>>>  
>>>> 
>>>>   
>>>>       
>>>>  
>>>  
>>  
>> 
>>   
>

Re: How to set up HDFS -> MySQL from trunk?

Reply via email to