Hi all,

as discussed a few times in the past, we have the possibility to enable
the Wiki on the github repository. In the past a few of us thought it
would be a nice alternative to the obsolete architecture manual because
it would allow a number of people to contribute to various areas with a
relative ease.

Thus today I've given it a deeper thought and figured that not only the
architecture manual should go there but also some recommendations or
indications about how to test certain things.

So in order to help start with something, I'm proposing that we create
sort of a plan vaguely following the one below (but I'm totally open to
criticism, feel free to voice in). I'd rather avoid to have placeholders
that will never be filled ("this site is under construction" for the old
ones like me), and I'm not sure we can save drafts, so maybe we can
simply place this into a separate document that serves as the initial
plan for reference.

If we manage to go far enough with this, we'll finally be able to kill
the architecture manual 12 years after its last update.

Please share ideas / dos / donts. However please keep in mind that wish
lists are best served when the requester proposes to handle them by
him/herself ;-)

Thanks,
Willy

---------------------------------------------------------------

Project organization
--------------------

Team

Places
  - haproxy.org
  - mailing list
  - discourse
  - github

Release cycle
  - development
  - stable
  - LTS

Contributing code
  - read CONTRIBUTING
  - read coding-style
  - read git log

Participating with no code
  - read problem reports
  - review / adjust patches
  - help others
  - contribute to the wiki
  - test the code
  - suggest use cases
  - report issues, gdb traces
  - bisect issues


Architecture manual
-------------------

Presentation

How a proxy works

Terminology
  - client
  - server
  - frontend / service
  - backend / farm
  - active / backup
  - connection, session, transaction, request, response

Topologies
  - edge + short silos
  - central LB + a bunch of servers, multiple layers
  - service clusters (stacks of [haproxy + servers])
  - sidecar

Setting up HA for haproxy
  - keepalived / ucarp / pacemaker ?
  - LVS
  - ECMP
  - ELB

Common use cases
  1) as a basic proxy
    - IPv6 to IPv4 gatewaying
    - port filtering
    - TLS enforcement / cert validation
    - protocol inspection. E.g. HTTP+SSH, SMTP banner delay
    - authentication
    - transparent proxying
    - logging / anomaly detection / time measurement
    - DoS protection (stick tables, tarpit)
    - traffic aggregation (multiple interfaces attachment)
    - traffic limitation (maxconn)

  2) as an accelerating proxy
    - TLS offloading
    - traffic compression
    - response caching

  3) as a load balancer
    - classical stateless L7 LB
    - classical stateful L7 LB
    - when to use round robin  -> short requests / web applications
    - when to use least conn   -> long sessions
    - when to use first        -> ephemeral VMs, fast scale-in/scale-out
    - when to use hashing      -> affinity (e.g. caches)
    - consistent vs map-based hashing
    - persistence vs hashing
    - inbound vs outbound load balancing
    - backup server(s)
    - grouping traffic to a single server (active/backup for data bases)

Advanced use cases
  - providing TLS to Varnish (in + out)
  - caching clusters with consistent hashing and small object caching
  - H2 in front of Nginx (max-reuse)
  - using priorities to speed up critical parts of a site
  - service discovery via DNS, CLI, Lua
  - managing certificates at scale / let's encrypt
  - tuning for extreme loads. pitfalls.
  - accessing services inside Linux containers using namespaces
  - multi-site abuser eviction (stick-tables + peers)

Scripting in Lua

On the fly management
  - stats page
  - CLI
  - signals
  - master-worker
  - agent-check
  - add-acl/del-acl
  - DNS

Operating system specificities
  - Linux >= 3.9 : SO_REUSEPORT
  - Linux >= 4.2 : IP_BIND_ADDRESS_NO_PORT

Performance considerations
  - orders of magnitude for a few typical metrics
  - cost of processing for various operations
  - cost of traversal for various topologies
  - optimizing for lowest latency
  - optimizing for highest throughput
  - optimizing for TCO


Benchmarks
----------

Principles
  - what
  - why
  - when
  - beware of audience

Conducting a benchmark
  - define purpose
  - define expected metrics
  - define ideal conditions
  - take note of real conditions
  - ensure reproducibility / minimise noise
  - problems are part of the process
  - report

Archived results
  - one per page : date, title, report


Testing new features
--------------------


---------------------------------------------------------------


Reply via email to