Hi Bharathwaaj,
Thanks for the minutes. The description of my talk, in the minutes, is a little terse, and might not accurately reflect what I said. A more accurate description can be obtained from my slides available at https://www.dropbox.com/s/uwzvgvfrs7o0nhf/slides.html?dl=1

Regards,
Vijay



On Sunday 07 October 2018 10:35 AM, Bharathwaaj S wrote:
Hello,

Apologize for the delay. Please find the minutes of September 2018 Meetup.

*Data Compression Techniques*
Data compression involves minimizing bytes size without degrading quality to an unacceptable level. There are lossy & lossless data compressions.

But how can we measure information? Information theory provides solution for the same. It defines 1 unit of information. Uncertainty, Information and Entropy are terms used in Information Theory. If a data is uncertain it means it has low probability and hence high in information and entropy. (For ex. it is hot in chennai is not an information but snow in chennai is)

If a data needs to be compressed, instead of coding directly the bits, we can alter the codeword based on their probability of occurrence. Huffman Coding Algorithm uses this method to achieve lossless data compression. It maps symbols to probability based codeword.

Information theory is a well developed field and many ideas are drawn from it in data sciences. On a lighter note, this was already implemented in Morse code on 1836 before Shannon formalised it on 1948.

*Last mile problem in ML*
Software Engineering involves a function which takes an input and gives an output. Machine Learning involves a good function which is called as model.

For ML we now have dead simple APIs with abundance power. The cornerstone of science is repeatable results. Since data science involves science, it is important to produce repeatable results and hence track experiments. When this is not done we end up with zombie models.

We need a way to obtain the following (wishlist)
- Remember what training data used
- Remember what code was used
- Remember configuration and hyperparameters used
- Remember results
- Save model
- Compare the results

We've a tool called mlflow which provides these. With the help of apis such as set_tracking_url, start, log_param, log_metric, log_artifact these could be achieved. We could also deploy to AWS sagemaker.

The code structure should be proper and should try to expose the models like a library. A sample code structure was shared.

*Pysangamam - Lessons learnt*
Timeline 2 keynote, 16 20 minute slots, 16 poster slots, 12 lightning slots. Idea started on Dec 8, 2017.

Zen - Local > national > international. Stick with where the base is more. Use mail lists in TN.

Prototype before implementing was the rule. And constraints lead to quality. Organizers cannot be speakers and ensure environment is kept clean after the event.

Good part:
All tasks were completed on time. There were rehearsals and the quality was good. Posters were very engaging. Lightning talks time managed using a countdown timer. Food was served on time. Reception was positive.

Website was great and social media were updated. The name & logo were well appreciated. Venue was spacious and compact. Contributor tickets helped provide discounts to students.

Hard part:
The process was painful. Difficult to keep enthusiasm and set the ball rolling. Ensured that the organizers F2F meet once every week. Sponsorship was difficult. No sufficient contacts.

Less takers for posters and logo approval took time. Video recording, banners had issues. No on the spot registration. Very few volunteers. And food got waste.

*Importance of unit testing:*
Unit testing makes product stable and prevents regression. Good unit test = No network, no db, no file modification, run parallel, no special environment. Artima Link: https://www.artima.com/weblogs/viewpost.jsp?thread=126923 <https://www.artima.com/weblogs/viewpost.jsp?thread=126923>

Mock external dependencies in unit tests.

pip install exam (Provides decorators like fixture, before, after). Use flake8

Importance of logging in Database - For disaster recovery.

Kind regards,
Bharath



_______________________________________________
Chennaipy mailing list
Chennaipy@python.org
https://mail.python.org/mailman/listinfo/chennaipy

_______________________________________________
Chennaipy mailing list
Chennaipy@python.org
https://mail.python.org/mailman/listinfo/chennaipy

Reply via email to