also llama.cpp is better in many ways

but in python with huggingface accelerate and the transformers package
it will spread between gpu and cpu ram, giving more total ram, if you
pass device_map='auto', and it will use fast mmap loading if you use a
safetensors model

note that huggingface's libs do tend to be somewhat crippled
user-focused things, maybe why i know them
  • [ml] langcha... Undescribed Horrific Abuse, One Victim & Survivor of Many
    • Re: [ml... efc
      • Re:... Undescribed Horrific Abuse, One Victim & Survivor of Many
        • ... efc
          • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
            • ... Undescribed Horrific Abuse, One Victim & Survivor of Many

Reply via email to