Hello Marcus,

Thank you for your encouraging response, I think testing, proper documentation 
and a tutorial will be very important. I think testing for all layers and 
metrics that I add in mlpack, shouldn't be very hard. Hopefully by then models 
repo would have been re-structured to support tests, tutorials and ready to 
deploy models. For object localization, we could implement the following tests:

1. Load weights and run for a few epochs (same as other models).
2. Take random images from validation dataset and set a minimum IoU (such that 
test don't fail yet show that the models is working fine.)
3. Run classification accuracy test that I am currently working on for object 
detection models in the repo as part of restructuring.

I think the most important part would be API (to increase flexibility 
especially for a model such as YOLO), proper documentation and tutorial. A user 
should be able to link a video or image and we directly save the frames / 
videos with bounding boxes. CLI would be very useful here.
For documentation I think I will add a ReadMe in each folder of models repo to 
describe uses, parameters, function call. Tutorials might be the ones that 
would require some time because they need to be simple enough so that a user 
can understand them without understanding all of the underlying code.

The differences between the models are minor so I think supporting other models
should be straightforward.

And Yes I agree, If we add some more layers and some residual blocks to Darknet 
19 we can get Darknet 53, so I can the same thing that I did with LeNet (v1, v4 
and v5). So, We can have a Darknet class add according to version we can return 
the layers and later add an alias to the version so that we can call DarkNet19 
and DarkNet53.

Yeah, time-consuming indeed, but possible, we have a bunch of machines we could
provide for training models.

That would be really great but I think I think loading weights of Darknet to 
YOLO makes sense so that at least that portion of the model doesn't have to 
change much in terms of weights.So it would overall take lesser time, and we 
are already training Darknet so we can easily use those weights.Later on, when 
bindings for other languages are added I think this will prove to be a very 
useful model especially in devices like Raspberry-Pi.
I think after I implement this, I would run inference on real-time videos on 
RPi3 and we could add those results in the readme as well. I would love to hear 
you opinions on the same to improve the proposal more. Thanks a lot.

Regards,
Kartik Dutt,
Github id: karikdutt18

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Reply via email to