I just sent the pr, fixed a typo in the comment. Add some comments and unit
test. Please let me know if you receive the patch.


On Mon, Jan 6, 2014 at 9:18 PM, Michael Kun Yang <kuny...@stanford.edu>wrote:

> I will follow up the newtown one later
>
>
> On Mon, Jan 6, 2014 at 9:14 PM, Michael Kun Yang <kuny...@stanford.edu>wrote:
>
>> I just sent the pr for multinomial logistic regression.
>>
>>
>> On Mon, Jan 6, 2014 at 6:26 PM, Michael Kun Yang <kuny...@stanford.edu>wrote:
>>
>>> Thanks, will do.
>>>
>>>
>>> On Mon, Jan 6, 2014 at 6:21 PM, Reynold Xin <r...@databricks.com> wrote:
>>>
>>>> Thanks. Why don't you submit a pr and then we can work on it?
>>>>
>>>> > On Jan 6, 2014, at 6:15 PM, Michael Kun Yang <kuny...@stanford.edu>
>>>> wrote:
>>>> >
>>>> > Hi Hossein,
>>>> >
>>>> > I can still use LabeledPoint with little modification. Currently I
>>>> convert
>>>> > the category into {0, 1} sequence, but I can do the conversion in the
>>>> body
>>>> > of methods or functions.
>>>> >
>>>> > In order to make the code run faster, I try not to use DoubleMatrix
>>>> > abstraction to avoid memory allocation; another reason is that jblas
>>>> has no
>>>> > data structure to handle symmetric matrix addition efficiently.
>>>> >
>>>> > My code is not very pretty because I handle matrix operations
>>>> manually (by
>>>> > indexing).
>>>> >
>>>> > If you think it is ok, I will make a pull request.
>>>> >
>>>> >
>>>> >> On Mon, Jan 6, 2014 at 5:34 PM, Hossein <fal...@gmail.com> wrote:
>>>> >>
>>>> >> Hi Michael,
>>>> >>
>>>> >> This sounds great. Would you please send these as a pull request.
>>>> >> Especially if you can make your Newtown method implementation
>>>> generic such
>>>> >> that it can later be used by other algorithms, it would be very
>>>> helpful.
>>>> >> For example, you could add it as another optimization method under
>>>> >> mllib/optimization.
>>>> >>
>>>> >> Was there a particular reason you chose not use LabeledPoint?
>>>> >>
>>>> >> We have some instructions for contributions here: <
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>>>> >
>>>> >>
>>>> >> Thanks,
>>>> >>
>>>> >> --Hossein
>>>> >>
>>>> >>
>>>> >> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <
>>>> kuny...@stanford.edu
>>>> >>> wrote:
>>>> >>
>>>> >>> I actually have two versions:
>>>> >>> one is based on gradient descent like the logistic regression on
>>>> mllib.
>>>> >>> the other is based on Newtown iteration, it is not as fast as SGD,
>>>> but we
>>>> >>> can get all the statistics from it like deviance, p-values and
>>>> fisher
>>>> >> info.
>>>> >>>
>>>> >>> we can get confusion matrix in both versions
>>>> >>>
>>>> >>> the gradient descent version is just a modification of logistic
>>>> >> regression
>>>> >>> with my own implementation. I did not use LabeledPoints class.
>>>> >>>
>>>> >>>
>>>> >>> On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <evan.spa...@gmail.com
>>>> >
>>>> >>> wrote:
>>>> >>>
>>>> >>>> Hi Michael,
>>>> >>>>
>>>> >>>> What strategy are you using to train the multinomial classifier?
>>>> >>>> One-vs-all? I've got an optimized version of that method that I've
>>>> been
>>>> >>>> meaning to clean up and commit for a while. In particular, rather
>>>> than
>>>> >>>> shipping a (potentially very big) model with each map task, I ship
>>>> it
>>>> >>> once
>>>> >>>> before each iteration with a broadcast variable. Perhaps we can
>>>> compare
>>>> >>>> versions and incorporate some of my optimizations into your code?
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> Evan
>>>> >>>>
>>>> >>>>>> On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <
>>>> kuny...@stanford.edu>
>>>> >>>>> wrote:
>>>> >>>>>
>>>> >>>>> Hi Spark-ers,
>>>> >>>>>
>>>> >>>>> I implemented a SGD version of multinomial logistic regression
>>>> based
>>>> >> on
>>>> >>>>> mllib's optimization package. If this classifier is in the future
>>>> >> plan
>>>> >>> of
>>>> >>>>> mllib, I will be happy to contribute my code.
>>>> >>>>>
>>>> >>>>> Cheers
>>>> >>
>>>>
>>>
>>>
>>
>

Reply via email to