Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

Nirmal Fernando Thu, 13 Aug 2015 21:43:44 -0700

On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara <[email protected]>
wrote:


> Hi,
>
> This was mainly due to the detection of a numerical feature as a
> categorical one.
> Oh, it makes sense now. Why don't we try taking a sample of data and if
> the sample contains only integers (or doubles without any decimals) or
> strings, consider it as a categorical variable.
>

I tried that approach too, but there're some datasets like automobile
dataset normalized-losses feature, which has integer values (0-164) but
which is probably not categorical.

>
> We suggested increasing the categorical threshold as a work-around.
> @thushan did it work?
> Yes, it worked. After increasing the threshold to 40.
>
> On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando <[email protected]> wrote:
>
>> This was mainly due to the detection of a numerical feature as a
>> categorical one.
>>
>> We suggested increasing the categorical threshold as a work-around.
>> @thushan did it work?
>>
>> On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara <[email protected]>
>> wrote:
>>
>>> This issue occurs, if I turn the response variable to a categorical
>>> variable. If I get the variable as a numerical variable, the values are
>>> read correctly.
>>>
>>> So I presume there is a fault in categorical conversion of the variable.
>>>
>>> On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara <[email protected]>
>>> wrote:
>>>
>>>> I still get the same result
>>>>
>>>> 1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0
>>>> 1.0     1.0     1.0     12.0    12.0    12.0    12.0    12.0    12.0
>>>> 12.0    12.0    12.0    12.0    13.0    13.0    13.0    13.0    13.0    
>>>> 13.0
>>>> 13.0    13.0    13.0    13.0    14.0    14.0    14.0    14.0    14.0
>>>> 14.0    14.0    14.0    15.0    15.0    15.0    15.0    15.0    15.0
>>>> 15.0    15.0    15.0    15.0    15.0    15.0    16.0    16.0    16.0    
>>>> 16.0
>>>> 16.0    16.0    16.0    16.0    17.0    17.0    17.0    17.0    17.0
>>>> 17.0    17.0    17.0    17.0    17.0    18.0    18.0    18.0    18.0
>>>> 18.0    18.0    18.0    18.0    18.0    18.0    18.0    19.0    19.0    
>>>> 19.0
>>>> 19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0
>>>> 19.0    19.0    2.0     2.0     2.0     2.0     2.0     2.0     2.0
>>>> 2.0     2.0     2.0     2.0     2.0     2.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     4.0     4.0     4.0     4.0     4.0     4.0
>>>> 4.0     4.0     4.0     4.0     4.0     4.0     5.0     5.0     5.0     5.0
>>>> 5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0
>>>> 6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0
>>>> 6.0     6.0     6.0     7.0     7.0     7.0     7.0     7.0     7.0     7.0
>>>> 7.0     7.0     7.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>> 3.0     3.0     3.0     3.0
>>>>
>>>> On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando <[email protected]>
>>>> wrote:
>>>>
>>>>> Can you use following code and try;
>>>>>
>>>>> List<LabeledPoint> points = labeledPoints.collect();
>>>>> for(int i=0;i<points.size();i++){
>>>>>              System.out.print(points.get(i).label() + "\t");
>>>>>             }
>>>>>
>>>>> On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I used the following snippet
>>>>>>
>>>>>> for(int i=0;i<labeledPoints.collect().size();i++){
>>>>>>             System.out.print(labeledPoints.collect().get(i).label()
>>>>>> + "\t");
>>>>>>             }
>>>>>>
>>>>>> in the public MLModel build() throws MLModelBuilderException in
>>>>>> DeeplearningModelBuilder.java
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi thushan,
>>>>>>>
>>>>>>> We need more info. What did you exactly print and where?
>>>>>>>
>>>>>>> On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I found the potential cause of the poor accuracy for the leaf
>>>>>>>> dataset. It seems the data read into ML is wrong.
>>>>>>>>
>>>>>>>> I have attached the data file as a CSV (classes are in the last
>>>>>>>> column)
>>>>>>>>
>>>>>>>> However, when I print out the labels of the read data (classes), it
>>>>>>>> looks something like below. Clearly there aren't this many "3.0" 
>>>>>>>> classes
>>>>>>>> and there should be classes up to 36.0.
>>>>>>>>
>>>>>>>> Is this caused by a bug?
>>>>>>>>
>>>>>>>> 1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0
>>>>>>>> 1.0     1.0     1.0     1.0     12.0    12.0    12.0    12.0    12.0
>>>>>>>> 12.0    12.0    12.0    12.0    12.0    13.0    13.0    13.0    13.0
>>>>>>>> 13.0    13.0
>>>>>>>> 13.0    13.0    13.0    13.0    14.0    14.0    14.0    14.0
>>>>>>>> 14.0    14.0    14.0    14.0    15.0    15.0    15.0    15.0    15.0
>>>>>>>> 15.0    15.0    15.0    15.0    15.0    15.0    15.0    16.0    16.0
>>>>>>>> 16.0    16.0
>>>>>>>> 16.0    16.0    16.0    16.0    17.0    17.0    17.0    17.0
>>>>>>>> 17.0    17.0    17.0    17.0    17.0    17.0    18.0    18.0    18.0
>>>>>>>> 18.0    18.0    18.0    18.0    18.0    18.0    18.0    18.0    19.0
>>>>>>>> 19.0    19.0
>>>>>>>> 19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0
>>>>>>>> 19.0    19.0    19.0    2.0     2.0     2.0     2.0     2.0     2.0
>>>>>>>> 2.0     2.0     2.0     2.0     2.0     2.0     2.0     3.0     3.0
>>>>>>>> 3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     4.0     4.0     4.0     4.0     4.0
>>>>>>>> 4.0     4.0     4.0     4.0     4.0     4.0     4.0     5.0     5.0
>>>>>>>> 5.0     5.0
>>>>>>>> 5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0
>>>>>>>> 5.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0
>>>>>>>> 6.0     6.0     6.0     6.0     7.0     7.0     7.0     7.0     7.0
>>>>>>>> 7.0     7.0
>>>>>>>> 7.0     7.0     7.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>> 3.0     3.0
>>>>>>>> 3.0     3.0     3.0     3.0
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Thushan Ganegedara
>>>>>>>> School of IT
>>>>>>>> University of Sydney, Australia
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Thanks & regards,
>>>>>>> Nirmal
>>>>>>>
>>>>>>> Team Lead - WSO2 Machine Learner
>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>> Mobile: +94715779733
>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Thushan Ganegedara
>>>>>> School of IT
>>>>>> University of Sydney, Australia
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Thanks & regards,
>>>>> Nirmal
>>>>>
>>>>> Team Lead - WSO2 Machine Learner
>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>> Mobile: +94715779733
>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Thushan Ganegedara
>>>> School of IT
>>>> University of Sydney, Australia
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Thushan Ganegedara
>>> School of IT
>>> University of Sydney, Australia
>>>
>>
>>
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Team Lead - WSO2 Machine Learner
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>
>
> --
> Regards,
>
> Thushan Ganegedara
> School of IT
> University of Sydney, Australia
>



-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

Reply via email to