ONLINE COURSE – Python for data science, machine learning, and
scientific computing (PDMS02) This course will be delivered live
https://www.psstatistics.com/course/python-for-data-science-machine-learning-and-scientific-computing-pdms02/

Dates 4 - 8 May 2020
Time zone - UK (GMT)

Please email [email protected] with any questions
Course Overview:
Python is one of the most widely used and highly valued programming
languages in the world, and is especially widely used in data science,
machine learning, and in other scientific computing applications. This
course provides both a general introduction to programming with Python
and a comprehensive introduction to using Python for data science,
machine learning, and scientific computing. The skills learnt will be
of value to any marine a mammal researcher handling large data sets.
The major topics that we will cover include the following: the
fundamentals of general purpose programming in Python; using Jupyter
notebooks as a reproducible interactive Python programming
environment; numerical computing using numpy; data processing and
manipulations using pandas; data visualization using matplotlib,
seaborn, ggplot, bokeh, altair, etc; symbolic mathematics using sympy;
data science and machine learning using scikit-learn, keras, and
tensorflow; Bayesian modelling using PyMC3 and PyStan; high
performance computing with Cython, Numba, IPyParallel, Dask. Overall,
this course aims to provide a solid introduction to Python generally
as a programming language, and to its principal tools for doing data
science, machine learning, and scientific computing. (Note that this
course will focus on Python 3 exclusively given that Python 2 has now
reached it end of life).

Monday 4th – Classes from 09:30 to 17:30
• Topic 1: The What and Why of Python. In order to provide some
general background and context, we will describe Python where came
from, what its major design principles and intended use was
originally, and where and how it is now currently used. We will see
that Python is now extremely widely used, especially in powering the
web, in data science and machine learning, and system level
programming. Here, we also compare and contrast Python and R, given
that both are extremely widely used in data science.
• Topic 2: Installing and setting up Python. There are many ways to
write and execute code in Python. Which to use depends on personal
preference and the type of programming that is being done. Here, we
will explore some of the commonly used Integrated Development
Environments (IDE) for Python, which include Spyder and PyCharm. Here,
we will also mention and briefly describe Jupyter notebooks, which are
widely used for scientific applications of Python, and are an
excellent tool for doing reproducible interactive work. We will cover
Jupyter more extensively starting on Day 3. Also as part of this
topic, we will describe how to use virtual environments and package
installers such as pip and conda.
• Topic 3: Introduction to Python: Data Structures. We will begin our
coverage of programming with Python by introducing its different data
structures.and operations on data structures This will begin with the
elementary data types such as integers, floats, Booleans, and strings,
and the common operations that can be applied to these data types. We
will then proceed to the so-called collection data structures, which
primarily include lists, dictionaries, tuples, and sets.
• Topic 4: Introduction to Python: Programming. Having introduced
Python’s data types, we will now turn to how to program in Python. We
will begin with iteration, such as the for and while loops. We will
then cover conditionals and functions.

Tuesday 5th – Classes from 09:30 to 17:30
• Topic 5: Modules, packages, and imports. Python is extended by
hundreds of thousands of additional packages. Here, we will cover how
to install and import these packages, and also how to write our own
modules and packages.
• Topic 6: Numerical programming with numpy. Although not part of
Python’s official standard library, the numpy package is the part of
the de facto standard library for any scientific and numerical
programming. Here we will introduce numpy, especially numpy arrays and
their built in functions (i.e. “methods”).
• Topic 7: Data processing with pandas. The pandas library provides
means to represent and manipulate data frames. Like numpy, pandas can
be see as part of the de facto standard library for data oriented uses
of Python.
• Topic 8: Object Oriented Programming. Python is an object oriented
language and object oriented programming in Python is extensively used
in anything beyond the very simplest types of programs. Moreover,
compared to other languages, object oriented programming in Python is
relatively easy to learn. Here, we provide a comprehensive
introduction to object oriented programming in Python.
• Topic 9: Other Python programming features. In this section, we will
cover some important features of Python not yet covered. These include
exception handling, list and dictionary comprehensions, itertools,
advanced collection types including defaultdict, anonymous functions,
decorators, etc.

Wednesday 6th – Classes from 09:30 to 17:30
• Topic 10: Jupyter notebooks and Jupyterlab. Although we have already
introduced Jupyter notebooks, here we will explore them properly.
Jupyter notebooks are reproducible and interactive computing
environment that support numerous programming languages, although
Python remains the principal language used in Jupyter notebooks. Here,
we’ll explore their major features and how they can be shared easily
using GitHub and Binder.
• Topic 11: Data Visualization. Python provides many options for data
visualization. The matplotlib library is a low level plotting library
that allows for considerable control of the plot, albeit at the price
of a considerable amount of low level code. Based on matplotlib, and
providing a much higher level interface to the plot, is the seaborn
library. This allows us to produce complex data visualizations with a
minimal amount of code. Similar to seaborn is ggplot, which is a
direct port of the widely used R based visualization library. In this
section, we will also consider a set of other visualization libraries
for Python. These include plotly, bokeh, and altair.
• Topic 12: Symbolic mathematics. Symbolic mathematics systems, also
known as computer algebra systems, allow us to algebraically
manipulate and solve symbolic mathematical expression. In Python, the
principal symbolic mathematics library is sympy. This allows us
simplify mathematical expressions, compute derivatives, integrals, and
limits, solve equations, algebraically manipulate matrices, and more.
• Topic 13: Statistical data analysis. In this section, we will
describe how to perform widely used statistical analysis in Python.
Here we will start with the statsmodels package, which provides linear
and generalized linear models as well as many other widely used
statistical models. We will also introduce the scikit-learn package,
which we will more widely use on Day 4, and use it for regression and
classification analysis.

Thursday 7th – Classes from 09:30 to 17:30
• Topic 14: Machine learning. Python is arguably the most widely used
language for machine learning. In this section, we will explore some
of the major Python machine learning tools that are part of the
scikit-learn package. This section continues our coverage of this
package that began in Topic 12 on Day 3. Here, we will cover machine
learning tools such as support vector machines, decision trees, random
forests, k-means clustering, dimensionality reduction, model
evaluation, and cross-validation.
• Topic 15: Neural networks and deep learning. A popular subfield of
machine learning involves the use of artificial neural networks and
deep learning methods. In this section, we will explore neural
networks and deep learning using the keras library, which is a high
level interface to neural network and deep learning libraries such as
Tensorflow, Theano, or the Microsoft Cognitive Toolkit (CNTK).
Examples that we will consider here include image classification and
other classification problems taken from, for example, the UCI Machine
Learning Repository.

Friday 8th – Classes from 09:30 to 16:00
• Topic 16: Bayesian models. Two probabilistic programming languages
for Bayesian modelling in Python are PyMC3 and PyStan. PyMC3 is a
Python native probabilistic programming language, while PyStan is the
Python interface to the Stan programming language, which is also very
widely used in R. Both PyMC3 and PyStan are extremely powerful tools
and can implement arbitrary probabilistic models. Here, we will not
have time to explore either in depth, but will be able to work through
a number of nontrivial examples, which will illustrate the general
feature
and usage of both languages.
• Topic 17: High performance Python. The final topic that we will
consider in this course is high performance computing with Python.
While many of the tools that we considered above extremely quickly
because they interface with compiled code written in C/C++ or Fortran,
Python itself is a high level dynamically typed and interpreted
programming language. As such, native Python code does not execute as
fast as compiled languages such as C/C++ or Fortran. However, it is
possible to achieve compiled language speeds in Python by compiling
Python code. Here, we will consider Cython and Numba, both of which
allow us achieve C/C++ speeds in Python with minimal extensions to our
code. Also, in this section, we will consider parallelization in
Python, in particular using IPyParallel and Dask, both of which allow
easy parallel and distributed processing using Python.


-- 
Oliver Hooker PhD.
PS statistics

2020 publications;
Parallelism in eco-morphology and gene expression despite variable
evolutionary and genomic backgrounds in a Holarctic fish. PLOS GENETICS
(2020). IN PRESS

www.PSstatistics.com
facebook.com/PSstatistics/
twitter.com/PSstatistics

53 Morrison Street
Glasgow
G5 8LB
+44 (0) 7966500340
_______________________________________________
MARMAM mailing list
[email protected]
https://lists.uvic.ca/mailman/listinfo/marmam

Reply via email to