In their paper ( https://arxiv.org/pdf/2208.00635.pdf ) they say the
highest scores on CommonSenseQA were acquired via what they call
"DictRoBERTa + LWA(K+V)".

LWA means "Layer-wise Extra-hop Attention"

....

well i misplaced that.

i think i'll try to adapt bloom-560m to do this.
my plan is to give it a small dataset that i add to by hand
and have it break the dataset into train/test and train an adapter so
long as the loss on the test drops

i infer there is something wrong with that plan, but it's a start
  • [ot][spam][c... Undescribed Horrific Abuse, One Victim & Survivor of Many
    • Re: [ot... Undescribed Horrific Abuse, One Victim & Survivor of Many
      • Re:... Undescribed Horrific Abuse, One Victim & Survivor of Many
        • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
          • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
            • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
              • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
                • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
                • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
                • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
                • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
                • ... Undescribed Horrific Abuse, One Victim & Survivor of Many
                • ... ytooR

Reply via email to