Hi Bharathwaaj,
Thanks for the minutes. The description of my talk, in the minutes, is a
little terse, and might not accurately reflect what I said. A more
accurate description can be obtained from my slides available at
https://www.dropbox.com/s/uwzvgvfrs7o0nhf/slides.html?dl=1
Regards,
Vijay
On Sunday 07 October 2018 10:35 AM, Bharathwaaj S wrote:
Hello,
Apologize for the delay. Please find the minutes of September 2018 Meetup.
*Data Compression Techniques*
Data compression involves minimizing bytes size without degrading
quality to an unacceptable level. There are lossy & lossless data
compressions.
But how can we measure information? Information theory provides
solution for the same. It defines 1 unit of information. Uncertainty,
Information and Entropy are terms used in Information Theory. If a
data is uncertain it means it has low probability and hence high in
information and entropy. (For ex. it is hot in chennai is not an
information but snow in chennai is)
If a data needs to be compressed, instead of coding directly the bits,
we can alter the codeword based on their probability of occurrence.
Huffman Coding Algorithm uses this method to achieve lossless data
compression. It maps symbols to probability based codeword.
Information theory is a well developed field and many ideas are drawn
from it in data sciences. On a lighter note, this was already
implemented in Morse code on 1836 before Shannon formalised it on 1948.
*Last mile problem in ML*
Software Engineering involves a function which takes an input and
gives an output. Machine Learning involves a good function which is
called as model.
For ML we now have dead simple APIs with abundance power. The
cornerstone of science is repeatable results. Since data science
involves science, it is important to produce repeatable results and
hence track experiments. When this is not done we end up with zombie
models.
We need a way to obtain the following (wishlist)
- Remember what training data used
- Remember what code was used
- Remember configuration and hyperparameters used
- Remember results
- Save model
- Compare the results
We've a tool called mlflow which provides these. With the help of apis
such as set_tracking_url, start, log_param, log_metric, log_artifact
these could be achieved. We could also deploy to AWS sagemaker.
The code structure should be proper and should try to expose the
models like a library. A sample code structure was shared.
*Pysangamam - Lessons learnt*
Timeline 2 keynote, 16 20 minute slots, 16 poster slots, 12 lightning
slots. Idea started on Dec 8, 2017.
Zen - Local > national > international. Stick with where the base is
more. Use mail lists in TN.
Prototype before implementing was the rule. And constraints lead to
quality. Organizers cannot be speakers and ensure environment is kept
clean after the event.
Good part:
All tasks were completed on time. There were rehearsals and the
quality was good. Posters were very engaging. Lightning talks time
managed using a countdown timer. Food was served on time. Reception
was positive.
Website was great and social media were updated. The name & logo were
well appreciated. Venue was spacious and compact. Contributor tickets
helped provide discounts to students.
Hard part:
The process was painful. Difficult to keep enthusiasm and set the ball
rolling. Ensured that the organizers F2F meet once every week.
Sponsorship was difficult. No sufficient contacts.
Less takers for posters and logo approval took time. Video recording,
banners had issues. No on the spot registration. Very few volunteers.
And food got waste.
*Importance of unit testing:*
Unit testing makes product stable and prevents regression. Good unit
test = No network, no db, no file modification, run parallel, no
special environment. Artima Link:
https://www.artima.com/weblogs/viewpost.jsp?thread=126923
<https://www.artima.com/weblogs/viewpost.jsp?thread=126923>
Mock external dependencies in unit tests.
pip install exam (Provides decorators like fixture, before, after).
Use flake8
Importance of logging in Database - For disaster recovery.
Kind regards,
Bharath
_______________________________________________
Chennaipy mailing list
Chennaipy@python.org
https://mail.python.org/mailman/listinfo/chennaipy
_______________________________________________
Chennaipy mailing list
Chennaipy@python.org
https://mail.python.org/mailman/listinfo/chennaipy