Re: [Chennaipy] MOM - September 2018 Chennaipy Meetup

Vijay Kumar Sun, 14 Oct 2018 19:57:11 -0700

Hi Bharathwaaj,

Thanks for the minutes. The description of my talk, in the minutes, is alittle terse, and might not accurately reflect what I said. A moreaccurate description can be obtained from my slides available athttps://www.dropbox.com/s/uwzvgvfrs7o0nhf/slides.html?dl=1


Regards,
Vijay



On Sunday 07 October 2018 10:35 AM, Bharathwaaj S wrote:

Hello,

Apologize for the delay. Please find the minutes of September 2018 Meetup.

*Data Compression Techniques*
Data compression involves minimizing bytes size without degradingquality to an unacceptable level. There are lossy & lossless datacompressions.
But how can we measure information? Information theory providessolution for the same. It defines 1 unit of information. Uncertainty,Information and Entropy are terms used in Information Theory. If adata is uncertain it means it has low probability and hence high ininformation and entropy. (For ex. it is hot in chennai is not aninformation but snow in chennai is)
If a data needs to be compressed, instead of coding directly the bits,we can alter the codeword based on their probability of occurrence.Huffman Coding Algorithm uses this method to achieve lossless datacompression. It maps symbols to probability based codeword.
Information theory is a well developed field and many ideas are drawnfrom it in data sciences. On a lighter note, this was alreadyimplemented in Morse code on 1836 before Shannon formalised it on 1948.
*Last mile problem in ML*
Software Engineering involves a function which takes an input andgives an output. Machine Learning involves a good function which iscalled as model.
For ML we now have dead simple APIs with abundance power. Thecornerstone of science is repeatable results. Since data scienceinvolves science, it is important to produce repeatable results andhence track experiments. When this is not done we end up with zombiemodels.
We need a way to obtain the following (wishlist)
- Remember what training data used
- Remember what code was used
- Remember configuration and hyperparameters used
- Remember results
- Save model
- Compare the results
We've a tool called mlflow which provides these. With the help of apissuch as set_tracking_url, start, log_param, log_metric, log_artifactthese could be achieved. We could also deploy to AWS sagemaker.
The code structure should be proper and should try to expose themodels like a library. A sample code structure was shared.
*Pysangamam - Lessons learnt*
Timeline 2 keynote, 16 20 minute slots, 16 poster slots, 12 lightningslots. Idea started on Dec 8, 2017.
Zen - Local > national > international. Stick with where the base ismore. Use mail lists in TN.
Prototype before implementing was the rule. And constraints lead toquality. Organizers cannot be speakers and ensure environment is keptclean after the event.
Good part:
All tasks were completed on time. There were rehearsals and thequality was good. Posters were very engaging. Lightning talks timemanaged using a countdown timer. Food was served on time. Receptionwas positive.
Website was great and social media were updated. The name & logo werewell appreciated. Venue was spacious and compact. Contributor ticketshelped provide discounts to students.
Hard part:
The process was painful. Difficult to keep enthusiasm and set the ballrolling. Ensured that the organizers F2F meet once every week.Sponsorship was difficult. No sufficient contacts.
Less takers for posters and logo approval took time. Video recording,banners had issues. No on the spot registration. Very few volunteers.And food got waste.
*Importance of unit testing:*
Unit testing makes product stable and prevents regression. Good unittest = No network, no db, no file modification, run parallel, nospecial environment. Artima Link:https://www.artima.com/weblogs/viewpost.jsp?thread=126923<https://www.artima.com/weblogs/viewpost.jsp?thread=126923>
Mock external dependencies in unit tests.
pip install exam (Provides decorators like fixture, before, after).Use flake8
Importance of logging in Database - For disaster recovery.

Kind regards,
Bharath



_______________________________________________
Chennaipy mailing list
Chennaipy@python.org
https://mail.python.org/mailman/listinfo/chennaipy

_______________________________________________
Chennaipy mailing list
Chennaipy@python.org
https://mail.python.org/mailman/listinfo/chennaipy

Re: [Chennaipy] MOM - September 2018 Chennaipy Meetup

Reply via email to